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PEPTIDE LIBRARIES AS A SOXmrv. h p SYNflENR.c: 

This application is a continuation-in-part 
of co-pending U.S. Patent Application Serial No. 
5 08/292,902 filed August 18, 1994, the entire 
disclosure of which is incorporated herein by 
reference. 

1- FIELD OF THE TNVEMTJOfl 

10 Tne present invention relates generally to 

synthetic gene sequences ("syngenes")., particularly 
for use in gene therapy. Syngenes are nucleic acids 
that comprise synthetic gene sequences identified by 
screening synthetic random peptide libraries, for 
15 peptides that bind a ligand of choice. The synthetic 
gene sequences, together with, optionally, other DNA 
sequences that target the synthetic gene sequences or 
their encoded proteins to particular locations in vivo 
or intracellular^, or that contain processing 
signals, or that code for other peptides or amino acid 
sequences, are cloned into suitable expression 
vectors. The syngenes are used, for example, in gene 
therapy to supply, via expression of their encoded 
proteins, a therapeutic product. In another aspect, 
25 the invention relates to protein or peptide products 

of syngenes and their therapeutic and diagnostic uses. 
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2. BACKGRQT7WT) OF THF. TMVE]MTjflfT 

Gene therapy today is often directed to the 
replacement of a product of a defective gene. For 
example, a human adenosine deaminase gene has been 
engineered into skin fibroblasts of mice through the 
use of a retroviral vector (Palmer et al. # 1988, 
Proc. Natl. Acad. Sci. USA 88:1330-1334). The human 
0-globin gene has been transferred into mouse bone 
marrow cells and mouse erythroleukemia cells as a 
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model for th treatment of 0- thalassemia and sickle 
cell anemia (Novak et al., 1990, Proc. Natl. Acad, 
Sci. USA 87:3386-3390; Bender et al . , 1988, Mol . C 11. 
Biol. 8:1725-1735; Chao et al . , 1983, Cell 32:483- 
5 493). 

Other examples of the replacement of a 
defective gene product include: the delivering of the 
human cystic fibrosis transmembrane conductance 
regulator via a retrovirus into airway epithelia of 

10 mice (Hyde et al., 1993, Nature 362:250-255); in a 
rabbit model, the transfer of human low density 
lipoprotein receptor cDNA into hepatocytes (Wilson et 
al., 1992, J. Biol. Chern. 267:963-967); the transfer 
of human Factor IX cDNA into skin fibroblasts, 

15 followed by engraftment of the fibroblasts into mice 
(Louis and Verma, 1988, Proc. Natl. Acad. Sci. USA 
85:3150-3154) . 

In addition to replacing a defective gene 
product , gene therapy can also be used to provide a 

20 cell or an organism with a new function. Examples of 
such a use include: transferring the human growth 
hormone gene via a retrovirus into human keratinocytes 
and then grafting the keratinocytes into nude mice. 
(Morgan et al., 1987, Science 237:1476-1479); 

25 expressing therapeutics for cardiovascular disease in 
endothelial cells with the aim of delivering the 
therapeutics through the circulation (Nabel et al., 
1989, Science 244:1342-1344); the use of antisense c- 
myb oligonucleotides to prevent the proliferation of 

30 vascular smooth muscle cells (Simons et al., 1992, 
Nature 359:67-70) . 

Another aspect of gene therapy involves the 
inhibition or enhancement of the activity of a 
preselected gene. The preselected gene is one that is 

35 involved in some aspect of a diseased state. The 

ability to modulate the activity of such a gene would 
be useful in the treatment of the diseased state. In 
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this aspect, gene therapy is limited by the 
availability of specific modulators of specific target 
genes. Currently, there is no easy method of 
increasing the specificity of known modulators or 
discovering new modulators, it would be of great 
value to have reagents and methods for identifying 
such modulators . This would permit virtually any gene 
to be the subject of this aspect of gene therapy. 

Currently, all gene therapy programs use 
either 1) cloned genes that are endogenous in some 
organism or 2) genes encoding mRNAs that are antisense 
to known genes. Current methods of gene therapy 
usually utilize human genes, either in their sense or 
anti-sense orientations. It would be of great value 
to have a source of genes other than the human genome 
since the use of the human genome presents certain 
problems. Using the human genome entails the 
laborious procedures of the identification and 
isolation of a gene having a desired function. A 
further problem is the possibility that no human gene 
may possess the desired function. Even if a human 
gene is found that has the desired function, that gene 
may not have the desired specificity; it may be too 
large to be easily used in current gene therapy 
25 protocols; it may be encoded by multiple exons; and 
fragments of the gene may not be suitable for use in 
gene therapy because the gene's binding regions may be 
too complex for fragments of the endogenous protein to 
mimic its function, in order to avoid these problems, 
30 it would be highly desirable to have a method of 

producing genes for use in gene therapy that does not 
rely on the human genome as a source of those genes. 



20 
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2.2. 

The use of peptide libraries is well known 
in the art. Such peptide libraries have generally 
been constructed by one of two approaches. According 
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to one approach, peptides hav been chemically 
synthesized in vitro in several formats. For example, 
Fodor, S., et al., 1991, Science 251: 767-773, 
describes use of complex instrumentation, 
5 photochemistry and computerized inventory control to 
synthesize a known array of short peptides on an 
individual microscopic slide. Hough ten, R., et al., 
1991,- Nature 354: 84-86 , describes mixtures of free 
hexapeptides in which the first and second residues in 

10 each peptide were individually and specifically 

defined. Lam, K., et al . , 1991, Nature 354: 82-84, 
describes a "one bead, one peptide " approach in which 
a solid phase split synthesis scheme produced a 
library of peptides in which each bead in the 

15 collection had immobilized thereon a single, random 
sequence of amino acid residues. For the most part, 
the chemical synthetic systems have been directed to 
generation of arrays of short length peptides, 
generally fewer than about 10 amino acids or so, more 

20 particularly about 6-8 amino acids. Direct amino acid 
sequencing, alone or in combination with complex 
record keeping of the peptide synthesis schemes, is 
required to use these libraries. 

According to a second approach using 

25 recombinant DNA techniques, peptides have been 

expressed in biological systems as either soluble 
fusion proteins or viral capsid fusion proteins. 

A number of peptide libraries according to 
the second approach have used the M13 phage. M13 is a 

30 filamentous bacteriophage that has been a workhorse in 
molecular biology laboratories for the past 20 years. 
M13 viral particles consist of six different capsid 
proteins and one copy of the viral genome, as a 
single- stranded circular DNA molecule. Once the M13 

35 DNA has been introduced into a host cell such as £. 
coli, it is converted into double -stranded, circular 
DNA. The viral DNA carries a second origin of 
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replication that is used to generate the single- 
stranded DNA found in the viral particles. During 
viral morphogenesis, there is an ordered assembly of 
the single-stranded DNA and the viral proteins, and 
5 the viral particles are extruded from cells in a 

process much like secretion. The M13 virus is neither 
lysogenic nor lytic like other bacteriophage (e.g., 
X); cells, once infected, chronically release virus. 
This feature leads to high titers of virus in infected 
10 cultures, i.e., 10" pfu/ml. 

The genome of the M13 phage is -8000 
nucleotides in length and has been completely 
sequenced. The viral capsid protein, protein III 
(pill) is responsible for infection of bacteria. In 
15 E. coli, the pillin protein encoded by the F factor 
interacts with pill protein and is responsible for 
phage uptake. Hence, all JET. coli hosts for M13 virus 
are considered male because they carry the F factor. 
Several investigators have determined from mutational 
20 analysis that the 4 06 amino acid long pill capsid 

protein has two domains. The C-terminus anchors the 
protein to the viral coat, while portions of the 
N-terminus of pill are essential for interaction with 
the E. coli pillin protein (Crissman, J.W. and Smith, 
25 G.P., 1984, Virology 132: 445-455). Although the 

N-terminus of the pi I I protein has been shown to be 
necessary for viral infection, the extreme N-terminus 
of the mature protein does tolerate alterations. In 
1985, George Smith published experiments reporting the 
30 use of the pill protein of bacteriophage M13 as an 
experimental system for expressing a heterologous 
protein on the viral coat surface (Smith, G.P., 1985, 
Science 228: 1315-1317). It was later recognized, 
independently by two groups, that the M13 phage pi I I 
35 gene display system could be a useful one for mapping 
antibody epitopes. De la Cruz, V., et al., (1988, 
J. Biol, Chem. 263: 4318-4322) cloned and expressed 
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segments of the cDNA encoding the Plasmodium 
falciparum surface coat protein into the pi 1 1 gene, 
and recombinant phage were tested for immunoreactivity 
with a polyclonal antibody, Parmley, S.F. and Smith, 
5 G.P., (1988, Gene 73: 305-318) cloned and expressed 
segments of the E. coli 0-galactosidase gene in the 
pill gene and identified recombinants carrying the 
epitope of an anti-6-galactosidase monoclonal 
antibody. The latter authors also described a process 

10 termed "biopanning" , in which mixtures of recombinant 
phage were incubated with biotinylated monoclonal 
antibodies, and phage -antibody complexes could be 
specifically recovered with streptavidin-coated 
plastic plates. 

15 In 1989, Parmley and Smith, 1989, Adv. Exp. 

Med. Biol. 251:215-218 suggested that short, synthetic 
DNA segments cloned into the pi 1 1 gene might represent 
a library of epitopes. These authors reasoned that 
since linear epitopes were often -6 amino acids in 

20 length, it should be possible to use a random 

recombinant DNA library to express all possible 
hexapeptides to isolate epitopes that bind to 
antibodies . 

Scott and Smith, 1990, Science 249:386-390 

25 describe construction and expression of an "epitope 
library" of hexapeptides on the surface of M13. The 
library was made by inserting a 33 base pair Bgl I 
digested oligonucleotide sequence into an Sfi I 
digested phage fd-tet, i.e., fUSES RF. The 33 base 

30 pair fragment contains a random or "degenerate" coding 
sequence (NNK)< (SEQ ID NO: 1) where N represents G, A, 
T or C and K represents G or T. The authors stated 
that the library consisted of 2 x 10° recombinants 
expressing 4 x 10 7 different hexapeptides; 

35 theoretically, this library expressed €9% of the 
6.4 x 10 7 possible peptides (20 s ). Cwirla et al . , 
1990, Proc. Natl. Acad. Sci. USA 87: 6378-6382 also 
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d scribed a somewhat similar library of hexapeptides 
expressed as pill gene fusions of M13 fd phage. PCT 
publication WO 91/19818 dated December 26, 1991 by 
Dower and Cwirla describes a similar library of 
pentameric to octameric random amino acid sequences. 

Devlin et al., 1990, Science, 249:404-406, 
describes a peptide library of about 15 residues 
generated using an (NNS) coding scheme for 
oligonucleotide synthesis in which S is G or C. 

Christian et al., 1992, J. Mol. 
Biol. 227:711-718 described a phage display library, 
expressing decapeptides . The starting DNA was 
generated by means of an oligonucleotide comprising 
the degenerate codons [NN(G/T)] 10 (SEQ ID NO: 2) with a 
self -complementary 3' terminus. This sequence, in 
forming a hairpin, creates a self -priming replication 
site which could be used by T4 DNA polymerase to 
generate the complementary strand. The double- 
stranded DNA was cleaved at the Sfil sites at the 5» 
terminus and hairpin for cloning into the fUSES vector 
described by Scott and Smith, supra. 

Other investigators have used other viral 
capsid proteins for expression of non-viral DNA on the 
surface of phage particles. The protein pVIII is a 
major M13 viral capsid protein and interacts with the 
single stranded DNA of M13 viral particles at its 
C- terminus. It is 50 amino acids long and exists in 
approximately 2,700 copies per particle. The 
N- terminus of the protein is exposed and will tolerate 
insertions, although large inserts have been reported 
to disrupt the assembly of pVIII fusion proteins into 
viral particles (Cesareni, 1992, PEBS Lett. 307:66- 
70) . To minimize the negative effect of pVTIl fusion 
proteins, a phagemid system has been utilized. 
35 Bacterial cells carrying the phagemid are infected 
with helper phage and secrete viral particles that 
have a mixture of both wild-type and pVIII fusion 
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caps id molecules. pVIII has also served as a site for 
expressing peptides on the surface of M13 viral 
particles. Four and six amino acid sequenc s 
corresponding to different segments of the Plasmodium 
5 falciparum major surface antigen have been cloned and 
expressed in the comparable gene of the filamentous 
bacteriophage fd (Greenwood et al., 1991, J. Mol. 
Biol. 220:821-827) s 

Lenstra, 1992, J. Immunol. Meth. 152:149-157 

10 describes construction of a library by a laborious 
process encompassing annealing oligonucleotides of 
about 17 or 23 degenerate bases with an 8 nucleotide 
long, palindromic sequence at their 3' ends. This 
resulted in the expression of random hexa- or octa- 

15 peptides as fusion proteins with the jS-galactosidase 

protein in a bacterial expression vector. The DNA was 
then converted into a double -stranded form with Klenow 
DNA polymerase, blunt -end ligated into a vector, and 
then released as Hind III fragments. These fragments 

20 were then cloned into an expression vector at the 

C-terminus of a truncated 0-galactosidase to generate 
10 7 recombinants. Colonies were then lysed, blotted on 
nitrocellulose filters (lO*/filter) and screened for 
immunoreactivity with several different monoclonal 

25 antibodies. A number of clones were isolated by 
repeated rounds of screening and were sequenced. 

■ 

Screening of peptide libraries has generally 
been confined to the use of a restricted number of 
ligands. Most commonly, the ligand has been an 

30 antibody (Parmley and Smith, 1989, Adv. Exp. Med. 
Biol* 251:215-218; Scott and Smith, 1990, 
Science 249:386-390). Streptavidin (Fowlkes et al., 
1992; BioTechniques, 13:422-427) and concanavalin A 
(Oldenburg et al., 1992, Proc. Natl. Acad. Sci. USA 

35 89:5393-5397) have also been used as ligands. SH2 and 
SH3 domains have been used (Yu et al., 1994, Cell 
76:933-945) . 

- 8 - 
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Oligonucleotides have been used to screen a 
conventional cDNA library (Staudt et al., 1988, 
Science 241: 577-580). This resulted in the isolation 
of naturally occurring DNA binding proteins. Also, 
5 random DNA libraries have been screened for ligand 

binding properties (Bock et al., 1992, Nature 355:564- 
566; Tuerk et al . , 1992, Proc. Natl. Acad. Sci . USA 
89:6988-6992; Ellington et al . , 19 92 , Nature 355:850- 
852). U. S. Patent No. 5,096,815, U.S. Patent No. 
5,223,409, and U.S. Patent No. 5,198,346, all to 
Ladner et al., disclose the use of oligonucleotide 
ligands to screen a phage display library in which 
known, naturally occurring DNA binding proteins have 
been cloned. Following mutagenesis of certain defined 
positions in the sequences of the naturally occurring 
DNA binding proteins, stronger binding versions of 
those proteins were isolated. Another example of the 
screening of a non- random phage display library with 
an oligonucleotide is given in Rebar and Pabo, 1993, 
Science 263:671-673. This work resulted in the 
isolation of variants of known zinc finger DNA binding 
proteins. Rebar and Pabo do not disclose the use of 
random peptide libraries that are totally synthetic. 
As yet, oligonucleotides have not been used to screen 
a totally synthetic, random, phage display peptide 
library. 

A common goal of screening peptide libraries 
has generally been to find a peptide with a desired 
biological effect and (l) use that peptide directly,- 
(2) use the peptide as a basis for designing 
peptidomimetics for therapeutic use; and (3) use the 
peptide for research to map protein/target 
interactions, such as epitope mapping. Under current 
practice, once an appropriate peptide is identified 
from a library, that peptide, or a peptidomimetic of 
it, may be used therapeutically. However, use of such 
peptides or peptidomimetics may lead to problems of 
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peptide instability, delivery problems, or difficulty 
in making the peptidomimetic . The use of the syngenes 
of the present invention can obviate such 
difficulties. The use of syngenes is especially 
5 advantageous in cases where use of a long peptide (>20 
amino acids) is necessary. In such cases, making an 
appropriate peptidomimetic is difficult or impossible. 

U.S. Patent 5,195,346 to Ladner et al. 
("Ladner") , at column 20, suggests the in vivo 

10 therapeutic use of genes encoding certain DNA binding 
proteins selected from peptide display libraries. 
Ladner does not, however, disclose the use of totally 
synthetic random peptide libraries. In contrast to 
the syngenes of the present invention, Ladner stresses 

15 the use of peptide display libraries that are derived 
from naturally occurring sequences (in this instance, 
sequences encoding known binding proteins) by in vitro 
mutagenesis of these sequences. Prior to the present 
invention, it had not been recognized that totally 

20 synthetic random peptide libraries are a tremendous 
source of synthetic genes. 

Citation or identification of any reference 
herein shall not be construed as an admission that 
such reference is available as prior art to the 

25 present invention. 

3. SUMMARY OF THE INVENTION 

The present invention relates generally to 
synthetic gene sequences ("syngenes"), particularly 

30 for use in gene therapy. Syngenes are nucleic acids 
that comprise synthetic gene sequences identified by 
screening random peptide libraries for peptides that 
bind a ligand of choice. The synthetic gene sequences 
are cloned into suitable expression vectors together 

35 with, optionally, other DNA sequences that target the 
synthetic gene sequences or their encoded proteins to 
particular locations in vivo or intracellularly, or 
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that contain processing signals, or that code for 
other peptides or amino acid sequences. The syngenes 
are used, for example, in gene therapy to supply a 
therapeutic product via expression of their encoded 
5 proteins. In another aspect, the invention relates to 
protein or peptide products of syngenes and their 
therapeutic and diagnostic uses. 

Pharmaceutical compositions comprising 
syngenes or their encoded peptides are also provided. 
10 In on e embodiment, the invention provides 

methods of identifying syngenes thar bind to a 
particular ligand of choice. In specific aspects, the 
ligand of choice is a transcriptional regulatory site, 
a transcription factor that binds to a transcriptional 
15 regulatory site, or a binding partner/inhibitor of the 
transcription factor. In one aspect, such a 
transcriptional regulatory site is an NF-«B, AP-l, or 
ATP binding site, in other aspects, the transcription 
factor is NF-kB, AP-l, or ATF. Accordingly, in a 
20 specific embodiment, the present invention provides 
methods for identifying syngenes that inhibit or 
enhance the transcriptional activity of a wide variety 
of naturally occurring genes, preferably with a 
specificity not found in natural systems. Syngenes 
25 are also useful for modulating signal transduction 
pathways, metabolic pathways, RNA translation, and 
intracellular trafficking, in the cell membrane, 
syngenes may be used to modulate the activity of 
membrane receptors, ion channels, or exocytotic and 
30 endocytotic pathways. In tissue, syngenes, via 

expression of their encoded proteins, may be used to 
regulate cell/cell signalling and transcytosis . 
Cell/cell junctions and the extracellular matrix are 
appropriate targets for syngenes. Syngenes may be 
35 used to regulate cell adhesion or cell/cell 

recognition. In the general circulation, syngenes may 
be used to regulate the activity of receptor ligands. 
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4. FIGURE LEgBTPS 
The present invention may be understood more 

fully by reference to the following detailed 

description of the invention, examples of specific 

5 embodiments of the invention and the appended figures 

in which: 

Figure 1 (A-F) schematically illustrates 
construction of TSAR libraries. Figure 1A 
schematically depicts the synthesis and assembly of 

10 synthetic oligonucleotides for the linear libraries 

and bimolecular libraries illustrated in Figure. IB and 
1C. N » A # C, G or T; B « G, T or C and V « G, A, or 
C; and n and m are integers, such that 10 s n s 100 
and 10 * m s 100. Figure 1D-F schematically depicts 

15 representative libraries which are designed to be 

semirigid libraries. The synthesis and assembly of 
the oligonucleotides for the semirigid libraries are 
as in Figure 1A with modifications to include 
specified invariant positions. See Section 5.1.3.1. 

20 Figure 2 schematically depicts an exemplary 

mRNA expressed by a syngene. 

Figure 3 is a schematic depiction of a 
shuttle vector which can be used in one embodiment of 
the invention. (1) and (2) allow replication in E. 

25 coli and mammalian cells, respectively. (3) allows 

selection in E. coli. (4) and (5) allow transcription 
and mRNA processing, respectively, in mammalian 
cells. (6) is a syngene that preferably encodes 
amino- terminal spacer sequences, binding domain, 

30 targeting/localization signal, and, optionally, a 
second functional domain. 

Figure 4 presents the results of EL IS As on 
phage clones isolated from the R26 library using the 
H2kB oligonucleotide as a target. See Section 6.1.4 

35 for details. Numbers along the abscissa refer to 

phage clones as follows: 1 • clone 1; 2 - clone 2; 3 = 
clone 6; 4 « clone 5; 5 « m663; 6 « a random clone. 
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H2«B, IL6kB, ILBkB, and NFIL6 refer to oligonucleotide 
ligands as described in Section 6.1.1. Phag clones 
were tested for binding to plates coated with the 
oligonucleotide ligands. Phage clones were also 
5 tested for binding to plates coated with BSA. The 
height of the bars above the numbers along the 
abscissa represents the ratio of a phage clone's 
binding to an oligonucleotide ligand-coated plate 
compared to the phage clone's binding to the BSA- 
10 coated plate. 

Figure 5 schematically illustrates 
construction of the TSAR-9 library. N « A, C, G or T; 
B = G, T or C and V = G, A or C. 
GGCTCGAGN (NNB) ae CCAGGT is SEQ ID NO: 3; 
15 GGTCTAGA (VNN) 18 ACCTGG is SEQ ID NO: 4; 
TCGAGN (NNB) 18 CCAGGT is SEQ ID NO: 5; 
CTAGA (VNN) 14 ACCTGG is SEQ ID NO: 6; 
SHSS(R/S)X 18 PGX lt SRPSRT is SEQ ID NO: 7. 

Figure S schematically illustrates the m663 
20 expression vector. 

Figure 7 schematically illustrates 
construction of the TSAR-12 library. N « A, C, G or 
T; B - G, T or C and V = G, A or C. Insertion into a 
representative, appropriate vector and expression in 

25 an appropriate host is illustrated. 

ttttgtcgacN(NNB) 10 Ngcggtg is SEQ ID NO: 8; 
ttttactagt (VNN) 10 VNcaccgc is SEQ ID NO: 9; 
tcgacN(NNB) I0 Ngcggtg is SEQ ID NO: 10; 
ctagt (VNN) 10 VNcaccgc is SEQ ID NO: 11. 

30 Figure 8 schematically illustrates 

construction of the TSAR- 13 library. 

TGACGACTCGAGTTGTGGT (NNK) a GGTTGTGGA (NNK) .GGGTGCGGC is 
SEQ ID NO: 12; GATCCTTCTAGAACCTGGAGGCCCACAGCC (MNN) g 
GCCGCACCC is SEQ ID NO: 13; TCGAGTTGTGGT (NNK) t GGTTGT 
35 GGA (NNK) fi GGGTGCGGC is SEQ ID NO: 14; CTAGAACCTGGAGGC 
CCACAGCC (MNN) .GCCGCACCC is SEQ ID NO: 15; 
SSCGX 9 GCGX 8 GCGX fl GCGPPGSR is SEQ ID NO: 16. 
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Figure 9 schematically depicts the 
construction of the R26 library. The R26 expression 
library was constructed essentially as described for 
the TSAR- 9 library that is described in Section 6.9.1 
5 and its subsections, except for the modifications 

depicted in Figure 9. ctgtgcctcgagB (NNB) 12 Nccgcgg is 
SEQ ID NO; 17; ctgtgctctaga (VNN) 12 VNccgcgg is SEQ ID 
NO? IB; tcgagB (NNB) 12 Neegcgg is SEQ ID NO: 19; 
ctaga (VNN) 12 VNccgcgg is SEQ ID NO: 20; 

10 SHSS(S/R)X 12 *A6X I2 SRPSRT is SEQ ID NO: 21. 

Figure 10 (A-D) represents circular 
restriction maps of phagemid vectors, derived from 
phagemid pBluescript II SK\ in which a truncated 
portion encoding amino acid residues 198-406 of the 

15 pill gene of M13 is linked to a leader sequence of the 
E. coli Pel B gene and is expressed under control of a 
lac promoter. G and S represent the amino acids 
glycine and serine, respectively; c-myc represents the 
human c-myc oncogene epitope recognized by the 9E10 

20 monoclonal antibody described in Evan et al., 1985, 

Mol. Cell. Biol. 5:3610-3616. Figure 10A illustrates 
the restriction map of phagemid pDAFl; Figure 10B 
illustrates the restriction map of phagemid pDAF2; 
Figure 10C illustrates the restriction map of phagemid 

25 pDAF3; Figure 10D schematically illustrates the 

construction of phagemids pDAFl, pDAF2 and pDAF3 . 

Figure 11 schematically depicts the 
construction of the R8C library. The R8C expression 
library was constructed essentially as described for 

30 the TSAR-9 library that is described in Section 6,9.1 
and its subsections, except for the modifications 
depicted in Figure 11. 

TGACGTCTCGAGTTGT (NNK) 8 TGTGGATCTAGAAGGAT C is SEQ ID NO: 
22; GATCCTTCTAGATCC is SEQ ID NO: 23; 
35 TCGAGTTGT (NNK) 8 TGTGGAT is SEQ ID NO: 24; 

TCAGATCCACA (MNN) 8 ACAAC is SEQ ID NO: 25; SSCX B CGSRPSRT 
is SEQ ID NO: 26. 
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Figure 12 sch matically depicts the origin 
of one class of double -insert recombinants in the R8C 
1 ibr ary . TCGAGTTGT ( NNK ) a TGTGGATCTAGATCCACA ( MNN ) ^ACAAC 
is SEQ ID NO: 27; TCGAGTTGT (NNK) a TGTGGA 
5 TCTAGATCCACA (MNN) B ACAAC is SEQ ID NO: 28; 
SSCX 9 CGSRSTX 8 TTR is SEQ ID NO: 29 • 

Figure 13 schematically illustrates the 
construction of the DC43 TSAR library. The DC43 
expression library was constructed essentially as 

10 described for the TSAR- 9 library that is described in 
Section 6.9.1 and its subsections, except for the 
modifications depicted in Figure 13. 
GTGTGTCTCGAGN (NNB) 20 GGTTGTGGT is SEQ ID NO: 30; 
GTTGTGTCTAGA (VNN) 20 ACCACAACC is SEQ ID NO: 31; 

15 TCGAGN (NNB) 20 GGTTGTGGT is SEQ ID NO: 32; 
CTAGA(VNN) 20 ACCACAACC is SEQ ID NO: 33; 
HSS(S/R)X 20 GCGX 20 SRIEGRARPSR is SEQ ID NO: 34. 

Figure 14 shows the selectivity of binding 
of two H2*B binding phage <H2«B-1 and H2«B-2) , the 

20 NFIL6 binding phage NFIL6-1, and the parental phage 
m663 for three different target sites: the H2kB 
oligonucleotide, the NFIL6 oligonucleotide, and the 
IL6*B oligonucleotide. The numbers along the abscissa 
refer to the various phage as follows: 1 » H2«B-1; 2 = 

25 H2KB-2; 3 = m663; 4 « NFIL6-1. 

Figure 15 illustrates the molecular 
evolution scheme for phage H2kB-2 described in Section 
6.1.5 that produced the ME#1 library. %p indicates 
the approximate frequency at which the original amino 

30 acid residue is expected to occur in the phage of the 
ME#1 library. SSRTGNEQPPGSFGRAAGCFHPGCKYMKLN is SEQ 
ID NO: 35; TCCTCGAGaacbggbaabgabcabccbccbggbtcbttbgg 
bcgbgcbnCTGGTTnbttbcabccbggbtgbaabtabatbaabctbaab is 
SEQ ID NO: 36; GTGTGTGTCTCGAGaacbggbaabgabcabcbccbgg 

35 btcbttbggbcgbgcbnCTGGTTn is SEQ ID NO: 37; GTGTGTGTCT 

CGAvttvagvttvatvtavttvcavccvggvtgvaavnAACCAGn is SEQ 
ID NO: 38. 
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Figures 16A and 16B pr sent the results of 
phage ELISAs that show the relativ binding avidity 
the H2xB oligonucleotide of clones ieolat d from 
ME#l library as compared to the H2xB-2 clone. 
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5. DETAILED DESCRIPTION OF THE INVENTION 

The present invention relates to nucleic 
acids ( "syngenes" ) that comprise a random synthetic 
5 nucleic acid sequence that encodes a protein, peptide, 
or polypeptide (used interchangeably hereinafter and 
most commonly collectively referred to as "peptide" 
unless indicated otherwise explicitly or by context) 
that binds to a ligand of choice, and their uses in 

10 gene therapy fo myriad diseases and disorders of 
interest- The invention provides methods for 
identification of syngenes that encode peptides that 
specifically bind to a ligand of choice. Also 
provided are compositions comprising syngenes as well 

15 as uses of the syngenes, e.g., in diagnosis and 
therapy of various disorders. 

Syngenes encode a syngene product which is a 
peptide having at least one functional domain. The 
functional domain is a binding domain with affinity 

20 for a ligand of choice. 

The syngenes of the present invention are 
synthetic genes, that is, genes that are not known to 
be present in a naturally occurring genome. In a 
specific embodiment, a syngene is a nucleic acid 

25 encoding a peptide comprising a binding domain in 

which the binding domain sequence is . identified from a 
peptide library comprising at least 5 unpredictable 
contiguous amino acids in the variable portion of the 
library. In another embodiment, syngenes are made up 

30 of, at least in part, combinations of genes encoding 

functional domains, such combinations not occurring in 
nature. Syngenes may be composed of totally synthetic 
gene sequences, combinations of natural and totally 
synthetic gene sequences, or combinations of natural 

35 DNA sequences juxtaposed so as not to form a gene 
present in nature. 
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The present invention vastly expands the 
number of genes that are available for use in gen 
therapy. The present invention provides methods for 
finding synthetic nucleic acids encoding peptides with 
5 diagnostic or therapeutic value. The present 

invention further provides methods for delivery of 
such peptides in vivo by expression in vivo from the 
administered syngene, that potentially avoid the 
common problems of instability, clearance, etc. faced 
10 when dosing with proteins directly. More importantly, 
however, there are many instances where a protein that 
possesses a desired biological effect is unknown. For 
example, a known protein may not possess a desired 
level of binding specificity for a particular ligand. 
15 Syngenes encoding peptides with such novel 

specificities can be advantageously identified by the 
methods of the present invention. 

The present invention provides methods for 
identifying synthetic peptides, and the nucleic acids 
20 encoding them, to fulfill roles that naturally 

occurring proteins have not necessarily been evolved 
for. One need only identify a target whose activity 
it would be desirable to modulate. Given such a 
target, a synthetic peptide in a random peptide 
25 library that binds the target and can thus modulate 

the target's activity can be readily identified by the 
present invention. The present invention also 
provides for the identification of a nucleic acid 
(syngene) that encodes the peptide. Such a gene, 
referred to as a syngene, can then be used, e.g., in 
gene therapy. Alternatively, the syngene may be 
introduced into an appropriate host cell and thereby 
used for the recombinant production of its encoded 
protein. 

35 In a specific embodiment, the present 

invention provides methods for identifying syngenes 
that inhibit or enhance the transcriptional activity 
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of a wide variety of naturally occurring genes, 
preferably with a specificity not found in natural 
systems . 

In addition to affecting transcription, 
5 among the other nuclear processes that syngenes may be 
useful in modulating are: post -transcriptional 
processing of RNA, DNA repair, and DNA replication. 

Syngenes are also useful for modulating 
processes that occur outside of the nucleus. For 

10 example, in the cytoplasm/ the encoded protein of a 
syngene, produced by intracellular expression of the 
syngene, may be used to affect the activity of 
inhibitors of transcription factors such as IF-kB. 
Syngenes may also be used to modulate signal 

15 transduction pathways, metabolic pathways, RNA 

translation, and intracellular trafficking. In the 
cell membrane, syngenes may be used to modulate the 
activity of membrane receptors, ion channels, or 
exocytotic and endocytotic pathways. 

20 In tissue, syngenes, via expression of their 

encoded proteins, may be used to regulate cell/cell 
signalling and transcytosis . Cell/cell junctions and 
the extracellular matrix are appropriate targets for 
syngene expressed peptides. Syngenes may be used to 

25 regulate cell adhesion or cell/cell recognition. In 
the general circulation, syngenes may be used to 
regulate the activity of receptor ligands. 

In another embodiment, the invention 
provides the peptides comprising binding domains that 

30 are encoded by syngenes, as well as their therapeutic 
and diagnostic uses, and compositions comprising such 
peptides. 



35 



5.1. RANDOM PEPTIDE LIBRARIES FOR USE IN 
IDENTIFYING SYNTHETIC GENE SEQUENCES 
ENCODING A BINDING DOMAIN 
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Binding domains encoded by the syngenes of 
the invention can be identified from a random peptide 
expression library or a chemically synthesized random 
peptide library. A nucleic acid which expresses a 
5 peptide which binds to a ligand of choice can be 
identified and recovered from a random peptide 
expression library, and then sequenced to determine 
its nucleotide sequence and hence its deduced amino 
acid sequence that mediates binding. Alternatively, 
10 the amino acid sequence of an appropriate binding 
domain can be determined by direct determination of 
the amino acid sequence of a peptide selected from a 
random peptide library containing chemically 
synthesized peptides, whereby an appropriate syngene 
encoding the peptide can be designed and made. In a 
less preferred aspect, direct amino acid sequencing of 
a binding peptide selected from a random peptide 
expression library can also be performed, and an 
encoding nucleic acid designed. Where it is desired 
20 to decrease the size of the syngene -encoded binding 
domain, methods can be used to identify portions of 
the determined synthetic amino acid or nucleotide 
sequences which respectively mediate binding, or 
encode the sequences which mediate binding, as 
25 described in Section 5.3 below. 

The term "random" peptide libraries is meant 
to include within its scope libraries of both 
partially and totally random peptides. Thus, peptide 
sequences with a stretch of at least 5 unpredictable 
amino acids as well as invariant amino acid sequences 
are included within the scope of the random peptides. 
The syngenes encoding such peptides will have both 
codons of unpredictable and codons of invariant 
sequence or (due to the degeneracy of the genetic 
35 code) degenerate codons thereof. However, the binding 
domains of syngenes are not cDNA or generated so as to 
be genomic sequences. The syngenes of the invention 
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are not sequences generated by a method comprising 
mutagenesis (even random mutagenesis) of a cDNA 
sequence or portion thereof or genomic sequence or 
portion thereof coding for a peptide having a 
5 predetermined activity. 

By the methods of the present invention, the 
binding domains encoded by syngenes are advantageously 
identified from random peptide libraries. Typically, 
random peptide libraries will be encoded by synthetic 

10 oligonucleotides with at least 15 contiguous variant 
nucleotide positions having the potential to encode 
all 20 naturally occurring amino acids. The sequence 
of amino acids encoded by the variant nucleotides is 
unpredictable and substantially random. The terms 

15 "unpredicted" , "unpredictable" and "substantially 

random" are used interchangeably with respect to the 
amino acids encoded and are intended to mean that the 
variant nucleotides at any given position encoding the 
binding domain of the syngene product are such that it 

20 cannot be predicted which of the 20 naturally 

occurring amino acids will appear at that position. 
These variant nucleotides are the product of random 
chemical synthesis. As will become clear, the 
biological random peptide libraries envisioned for use 

25 include those in which a bias has been introduced into 
the random sequence, e.g., to disfavor stop codon 
usage . 

In a specific embodiment, a syngene of the 
invention encodes a peptide comprising a binding 

30 domain (which binds to a ligand of choice) , in which 

the nucleotide sequence encoding the binding domain is 
a sequence identified by a method comprising screening 
a library of recombinant vectors, said vectors 
comprising unpredictable nucleotides arranged in one 

35 or more contiguous sequences, wherein the total number 
of unpredictable nucleotides is greater than or equal 
to 15 and less than or equal to about 300. In other 
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specific embodiments, the total number of such 
unpredictable nucleotides is in the range of 15-600, 
24-600, 60-600, 15-120, 60-120, or 60-300. 

5 5.1.1. CHEMICALLY SYNTHES T ZFD PEPTIDE T.THP^ T B| Q 

The peptide libraries used in the present 
invention may be libraries that are chemically 
synthesized in vitro. Examples of such libraries are 
given in Fodor, S., et al., 1991, Science 251:767-773, 

10 which describes the synthesis of a known array of 
short peptides on an individual microscopic slide; 
Houghten, R. , et al., 1991, Nature 354:84-86, which 
describes mixtures of free hexapeptides in which the 
first and second residues in each peptide were 

15 individually and specifically defined; Lam, K., et 
al., 1991, Nature 354:82-84, which describes a split 
synthesis scheme; and Medynski, 1994, Bio/Technology 
12:709-710, who describes split synthesis and T-bag 
synthesis methods as well. See also Gallop et al., 

20 1994, J. Medicinal Chemistry 37 (9) :1233-1251. 

Screening of chemically synthesized peptide 
libraries to identify peptides which bind to a ligand 
of choice can be carried out by methods well known in 
the art. 

25 In a specific embodiment, the total number 

of unpredictable amino acids in the peptides of the 
library used for screening is greater than or equal to 
5 and less than or equal to 25; in other embodiments 
the total is in the range of 5-15 or 5-10 amino acids, 

30 preferably contiguous amino acids. 

While a binding domain can be identified 
from chemically synthesized peptide libraries and an 
appropriate syngene synthesized to encode such a 
binding domain, such domains would be small (i.e. less 

35 than 10 amino acids, and most probably 5-6 amino 

acids, in length) . Therefore, this approach is less 
preferred than the biological peptide libraries 
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containing unpredictable sequences of greater length, 
described below. 

'S.1.2. BIOLOGICAL PEPTIDE LIBRARIES 

5 In another embodiment, biological random 

peptide libraries are used to identify a binding 
domain which binds to a ligand of choice. Many 
suitable biological random peptide libraries are known 
in the art and can be used to screen for a peptide 
10 that binds to a ligand of choice, according to 
standard methods commonly known in the art. 

According to this second approach, involving 
recombinant DNA techniques, peptides have been 
expressed in biological systems as either soluble 
15 fusion proteins or viral capsid fusion proteins. 

A number of peptide libraries according to 
this approach have used the M13 phage. Although the 
N- terminus of the viral capsid protein, protein III 
(pi II) , has been shown to be necessary for viral 
20 infection, the extreme N- terminus of the mature 

protein does tolerate alterations such as insertions. 
Accordingly, various random peptide libraries, in 
which the diverse peptides are expressed as pill 
fusion proteins, are known in the art; these libraries 
25 can be used to identify syngene- encoded binding 
domains by screening against a ligand of choice. 
Examples of such libraries are described below. 

Scott and Smith, 1990, Science 249:386-390 
describe construction and expression of an "epitope 
30 library" of hexapeptides on the surface of M13. 

Cwirla et al., 1990, Proc. Natl. Acad. Sci. USA B7: 
6378-6382 also described a somewhat similar library of 
hexapeptides expressed as pill gene fusions of M13 fd 
phage. PCT publication WO 91/19818 dated December 26, 
35 1991 by Dower and Cwirla describes a similar library 
of pentameric to octameric random amino acid 
sequences. Devlin et al., 1990, Science, 249:404-406, 
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described a peptide library of about 15 residues 
generated using an (NNS) coding scheme for 
oligonucleotide synthesis in which S is G or C. 
Christian et al., 1992, J. Mol. Biol. 227:711-718 
5 described a phage display library expressing 
decapeptides . Lenstra, 1992, J. Immunol. Meth. 
152:149-157 described construction of a library by a 
laborious process encompassing annealing 
oligonucleotides of about 17 or 23 degenerate bases 
10 with an 8 nucleotide long palindromic sequence at 
their 3 ' ends . 

Other biological peptide libraries which can 
be used include those described in U.S. Patent No. 
5,270,170. dated December 14, 1993; and PCT 
15 Publication No. WO 91/19818, dated December 26, 1991. 

In a specific embodiment, the R8C random 
peptide library (described in Section 6.10 and Figures 
11 and 12) is used, in another specific embodiment, 
the R26 peptide library (described in Section 6.9.4 
and Figure 9) is used. In yet another specific 
embodiment, the DC43 peptide library (described in 
Section 6.9.5 and Figure 13 is used). 

The protein pVIIl is a major M13 viral 
capsid protein which can also serve as a site for 
25 expressing peptides on the surface of M13 viral 
particles in the construction of random peptide 
libraries . 

While it would be understood by one skilled 
in the art that as few as 5 amino acids can constitute 
a binding domain, the average functional domain within 
a natural protein is considered to be about 40 amino 
acids. Thus, the random peptide libraries from which 
the binding domains encoded by the syngenes of the 
present invention are preferably identified encode 
peptides having in the range of 5 to 200 total variant 
amino acids. Although it is contemplated that 
biologically expressed random peptide libraries 
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displaying short random inserts (i.e. less than 20 
amino acids in length) could be used to identify 
syngenes of the invention, the most preferred binding 
domains will be identified from biologically expressed 
5 random peptide libraries in which the displayed 

peptide has 20 or greater unpredictable amino acids 
i.e. preferably in the range of 20 to 100, and most 
preferably 20 to 50 amino acids, as exemplified by the 
TSAR libraries described herein. 

10 One of the objects of the present invention 

is to provide syngenes encoding binding domains of 
greater binding specificity than found in nature. To 
accomplish this, the invention preferably uses 
libraries of greater complexity than are commonly 

15 employed in the art. The conventional teaching in the 
random peptide library art is that the length of 
inserted oligonucleotides should be kept short, 
encoding preferably fewer than 15 and most preferably 
about 6-8 amino acids. However, not only can 

20 libraries encoding more than about 20 amino acids be 
constructed, but such libraries can be advantageously 
screened to identify peptides having binding 
specificity for a variety of ligands. Such libraries 
with longer length inserts are exemplified by the TSAR 

25 libraries, described in detail here inbe low in Section 
5.1.3 and its subsections. 

Libraries composed of longer length 
oligonucleotides afford the ability to identify 
peptides in which a short sequence of amino acids is 

30 common to or shared by a number of peptides binding a 
given ligand, i.e., library members having shared 
binding motifs. The use of longer length libraries 
also affords the ability to identify peptides which do 
not have any shared sequences with other peptides but 

35 which nevertheless have binding specificity for the 
same ligand. 
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Libraries having large inserted 

oligonucleotide sequences provide the opportunity to 

identify or map binding sites which encompass not only 

a few contiguous amino acid residues, i.e., simple 

5 binding sites, but also those which encompass 

discontinuous amino acids, i.e., complex binding 
sites. 

Additionally, the large size of the inserted 
synthesized oligonucleotides of certain libraries 
10 provides the opportunity for the development of 

secondary and/or tertiary structure in the potential 
binding peptides and in sequences flanking the actual 
binding site in the binding domain. Secondary and 
tertiary structure often significantly affect the 
ability of a sequence to mediate binding, as well as 
the strength and specificity of any binding which 
occurs. Such complex structural developments are not 
feasible when only small length oligonucleotides are 
used . 

Finally, as has been overlooked by the 
conventional wisdom, longer length peptide libraries 
provide a greatly enhanced complexity over shorter 
length peptide libraries. This greatly enhanced 
complexity is associated with the concept of sliding 
windows which must be counted inclusively, i.e., 
number of windows - [length of sequence] - [window 
size] + l. This concept can be illustrated by 
comparison of two libraries, as follows. Assume that 
a binding site to a ligand requires 5 contiguous amino 
acid residues (pentamer) . in two libraries composed 
of equal numbers of recombinants, a first library 
expressing pentamers, and another library constructed 
according to the present invention expressing 30-mers, 
the second library will be 26 times "richer" in 
binding sites relative to the first library, with the 
additional potential complexity contributed by 
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secondary or tertiary structure provided by flanking 
regions . 

Th refore, it is contemplated that the most 
preferred binding domains and the syngenes encoding 
5 them will be identified from biologically expressed 
random peptide libraries in which the displayed 
peptide is 20 or greater amino acids in length. 
Examples of such random peptide libraries are the TSAR 
libraries described in detail in Section 5.1.3 and its 

10 subsections. 

The most preferred libraries for practicing 
the invention are those that are generated or 
constructed to express a plurality of heterofunctional 
fusion peptides. The putative binding domains in 

15 these peptides are not known to be naturally occurring 
amino acid sequences or encoded by naturally occurring 
nucleotide sequences. Thus, for example, the random 
peptide libraries from which the binding domains 
encoded by syngenes are identified are not cDNA or 

20 genomic libraries. The sequence of any given peptide 
from the preferred libraries cannot be predicted in 
advance. In a preferred embodiment, the peptides are 
expressed on the surface of the recombinant vectors of 
the library. 

25 In one embodiment, the random library is a 

linear, non-constrained library. As would be 
understood by one in the art having considered the 
present disclosure, in another specific embodiment, 
"constrained", "structured" or "semi-rigid" random 

30 peptide libraries could also be used in the present 
methods to identify binding domains encoded by 
syngenes. Typically, these libraries express peptides 
that are substantially random but contain a small 
percentage of fixed residues within or flanking the 

35 random sequences that have the result of conferring 

structure or some degree of conformational rigidity to 
the peptide. In a semirigid peptide library, the 
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plurality of synthetic oligonucleotides express 
peptides that are each able to adopt only one or a 
small number of different conformations that are 
constrained* by the positioning of codons encoding 
5 certain structure conferring amino acids in or 
flanking the synthesized variant or unpredicted 
oligonucleotides. Unlike linear, unconstrained 
libraries in which the plurality of proteins expressed 
potentially adopt thousands of short-lived different 
10 conformations, in a semirigid peptide library, the 
plurality of proteins expressed can adopt only a 
single or a small number of conformations. Such 
libraries are exemplified by the TSAR- 13 and TSAR-14 
libraries described in Section 6.4 and its 
15 subsections; by a library of random 6 amino acid 

sequences, each flanked by invariant cysteine residues 
(O'Neil et al., 1992, Proteins 14:509-515); by a 
library of random 8 amino acid sequences, each flanked 
by invariant cysteine residues as in the R8C library 
described in Section 6.10 and Figures 11 and 12; and 
by those libraries disclosed in PCT Publication No. 
W094/11496, dated May 26, 1994. 

The DNA encoding the random peptides of the 
libraries can be synthetic or natural in origin. For 
25 example, DNA can be prepared by a method similar to 
methods used for constructing a cDNA library. 
However, the DNA is then sheared and/or digested into 
DNA pieces of 30 base pairs or smaller. The DNA 
pieces are then ligated together, preferably to sizes 
30 of about 120 base pairs to create DNA sequences that 
do not occur in nature. Peptides encoded by such 
sequences can be expressed in a library and screened 
for binding domains. However, as would be appreciated 
by one of skill in the art, such DNA sequences will 
35 contain stop codons and therefore would be a less 
desirable source of coding sequences. It is noted 
that such a library, by virtue of the juxtaposition of 



20 



WO 96/06188 



PCT/US95/10523 



cDNA sequences used in creating it, is not a cDNA 
library. 

It is not intended by the present invention 
to use cDNA libraries for identifying syngenes. Such 
5 libraries have been screened with oligonucleotides 
(Staudt et al., 1988, Science 241:577-580; Singh et 
al., 1988, Cell 52:415-423). These approaches yielded 
naturally occurring DNA binding proteins. The object 
of syngenes is to identify genes encoding sequences 
10 that are not present in nature. 

5.1.3. TSAR LIBRARIES 

In a preferred embodiment, a biological 
peptide library that is a random peptide "TSAR" 

15 library is screened to identify the synthetic gene 
sequences encoding a binding domain, for use in 
constructing a syngene. In this embodiment, the 
syngenes of the present invention encode peptides 
called TSARs which bind to a ligand of choice. TSARs 

20 is an acronym for "Totally Synthetic Affinity 

Reagents" as described in PCT publication WO 91/12328, 
dated August 22, 1991. TSAR libraries, their 
construction and use, and specific examples of TSAR 
libraries are described in these publications and in 

25 detail below. As "described herein, nucleic acids 
encoding TSARs or a TSAR portion which mediates 
binding to the ligand of choice can be used to 
identify and construct the syngenes of the present 
invention. 

30 As used in the present invention, a TSAR is 

intended to encompass a concatenated he tero functional 
peptide that includes at least two distinct functional 
regions. One region of the heterofunctional TSAR 
molecule is a binding domain with affinity for a 

35 ligand, that is preferably characterized by 1) its 

strength of binding under specific conditions, 2) the 
stability of its binding under specific conditions, 
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and 3) its selective specificity for the chosen 
ligand. A second region of the heterofunctional TSAR 
mol cule is an effector domain which in specific 
embodiments of the TSAR libraries is the pill protein 
or other structural protein providing for phage 
display of the heterofunctional peptide, m other 
embodiments, such a sequence can also include a 
sequence that is biologically or chemically active to 
enhance expression and/or detection and/or 
purification of the TSAR. Such a sequence can be 
chosen from a number of biologically or chemically 
active proteins including a structural protein or 
fragment that is accessibly expressed as a surface 
protein of a vector, an enzyme or fragment thereof, a 
toxin or fragment thereof, a therapeutic protein or 
peptide, or a protein or peptide whose function is to 
provide a site for attachment of a substance such as a 
metal ion, etc., that is useful for enhancing 
expression and/or detection and/or purification of the 
20 expressed TSAR. 

A TSAR can contain an optional additional 
linker domain or region between the binding domain and 
the effector domain. The linker region serves (l) as 
a structural spacer region between the binding and 
effector domains; (2} as an aid to uncouple or 
separate the binding and effector domains; or (3) 
structural aid for display of the binding domain 
and/or the TSAR by the expression vector. 

A TSAR may be a heterofunctional fusion 
protein, said fusion protein comprising (a) a binding 
domain encoded by an oligonucleotide comprising 
unpredictable nucleotides in which the unpredictable 
nucleotides are arranged in one or more contiguous 
sequences, wherein the total number of unpredictable 
nucleotides is greater than or equal to about 60 and 
less than or equal to about S00, and optionally, (b) 
an effector domain encoded by an oligonucleotide 
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sequence which is a protein or peptide that enhances 
expression or detection of the binding domain. 

Alternatively, a TSAR may be a heterofunctional 
fusion protein comprising (i) a binding domain encoded 
5 by a double stranded oligonucleotide comprising 

unpredictable nucleotides in which the unpredictable 
nucleotides are arranged in one or more contiguous 
sequences, wherein the total number of unpredictable 
nucleotides is greater than or equal to about 60 and 

10 less than or equal to about 600 and the contiguous 

sequences are flanked by invariant residues designed 
to encode amino acids that confer a desired structure 
to the binding domain of the expressed 
heterofunctional fusion protein, and, optionally, (ii) 

15 an effector domain encoded by an oligonucleotide 

sequence encoding a protein or peptide that enhances 
expression or detection of the binding domain. 

5.1.3.1. CONSTRUCTION OF TSAR LIBRARIES 

20 In one embodiment of the present invention, 

in order to identify and obtain the syngenes of the 
present invention, use is made of TSAR libraries. In 
order to prepare a library of vectors expressing a 
plurality of protein TSARs according to one embodiment 

25 of the present invention, single stranded sets of 

nucleotides are synthesized and assembled in vitro, by 
way of example, according to the following scheme. 

The synthesized nucleotide sequences are 
designed to have variant or unpredicted as well as 

30 invariant nucleotide positions. Pairs of variant 
nucleotides in which one individual member is 
represented by 5'{NNB) n 3' and the other member is 
represented by 3' (NNV) a 5' where N is A, C, G or T; B 
is G, T or C; V is G, A or C; n is an integer, such 

35 that 10 * n s 100, and m is an integer, such that 10 s 
m s 100, are synthesized for assembly into synthetic 
oligonucleotides. As assembled, there are at least n 
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♦ m variant codons in each inserted synthesized double 
stranded oligonucleotide sequence. 

As would be understood by those of skill in 
the art, the variant nucleotide positions have the 
potential to encode all 20 naturally occurring amino 
acids and, when assembled as taught by the present 
method, encode only one stop codon, i.e., TAG. The 
sequence of amino acids encoded by the variant 
nucleotides is unpredictable and substantially random 
in sequence. As explained hereinabove, the terms 
"unpredicted", "unpredictable" and "substantially 
random" are used interchangeably in the present 
application with respect to the amino acids encoded 
and are intended to mean that at any given position 
within the binding domain of the TSARs encoded by the 
variant nucleotides which of the 20 naturally 
occurring amino acids will occur cannot be predicted. 

The variant nucleotides, according to the 
TSAR scheme, encode all twenty naturally occurring 
amino acids by use of 48 different codons. 

Invariant nucleotides are positioned at 
particular sites in the nucleotide sequences to aid in 
assembly and cloning of the synthesized 
oligonucleotides. At the 5' termini of the sets of 
variant nucleotides, the invariant nucleotides encode 
for efficient restriction enzyme cleavage sites. The 
invariant nucleotides at the 5' termini are chosen to 
encode pairs of sites for cleavage by restriction 
enzymes (1) which can function in the same buffer 
conditions; (2) are commercially available at high 
specific activity; (3) are not complementary to each 
other to prevent self -ligation of the synthesized 
double stranded oligonucleotides; and (4) which 
require either 6 or 8 nucleotides for a cleavage 
recognition site in order to lower the frequency of 
cleaving within the inserted double stranded 
synthesized oligonucleotide sequences. According to 
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one particular method of constructing peptide 
libraries, the selected restriction site pairs are 
selected from Xho I and Xba I, and Sal I and Spe I. 
Other examples of useful restriction enzym sites 
5 include, but are not limited to: Nco I, Nsi I, Pal I, 
Not I, Sfi I, Pme I, etc. Restriction sites at the 5' 
termini invariant positions function to promote proper 
orientation and efficient production of recombinant 
molecule formation during ligation when the 
10 oligonucleotides are inserted into an appropriate 
expression vector. 

According to an alternative method of 
constructing peptide libraries, the variant 
nucleotides are synthesized using one or more 
15 methylated dNTPs and the 5' termini invariant 

nucleotides, encoding restriction sites for efficient 
cleavage, are synthesized using non-methylated dNTPs. 
This embodiment provides for efficient cleavage of 
long length synthesized oligonucleotides at the 
20 termini for insertion into an appropriate vector, 
while avoiding cleavage in the variant nucleotide 
sequences. 

The 3 ' termini invariant nucleotide 
positions are complementary pairs of 6, 9 or 12 

25 nucleotides to aid in annealing two synthesized single 
stranded sets of nucleotides together and conversion 
to double -stranded DNA, designated herein synthesized 
double stranded oligonucleotides. 

In particular peptide libraries, the 3' 

30 termini invariant nucleotides are selected from 

5 'GCGGTG 3 ' and 3 'CGCCAC 5 ', and 5 'CCAGGT 3 ' and 3 'GGTCCA 5 ', 
which also encode either a particular amino acid, 
glycine, or dipeptide proline-glycine, which provides 
the flexibility of either a swivel or hinge type 

35 configuration to the expressed proteins, polypeptides 
and/or peptides, respectively. 
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In yet another specific library, the 3 ' 
termini invariant nucleotides of the coding strand are 
5' GGGTGCGGC 3' which encode glycine, cysteine, 
glycine . in an oxidizing environment the cystein 
5 forms a disulfide bond with another cysteine 

engineered into the binding domain to form a semirigid 
conformation in the expressed peptide. 

In another library, the complementary 3 ! 
termini also encode an amino acid sequence that 
provides a short charge cluster (for example, KKKK 
(SEQ ID NO: 49), DDDD (SEQ ID NO: 50) or KDKD (SEQ ID 
NO: 51)), or a sharp turn (for example, NPXY (SEQ ID 
NO: 52), YXRF (SEQ ID NO: 53) where X is any amino 
acid) . in another alternative library, the 
complementary 3 • termini also encode a short amino 
acid sequence that provides a peptide known to have a 
desirable binding or other biological activity. 
Specific examples include complementary pairs of 
sequences encoding peptides including but not limited 
to RGD, HAV, HPQ6 where 9 is a non-polar amino acid. 
These short amino acid sequences are intended to aid 
in screening the library and retrieving members of the 
library; they are not intended to be binding domains. 

Figure 1A generally illustrates an assembly 
process. The oligonucleotide sequences are thus 
assembled by a process comprising: synthesis of pairs 
of single stranded nucleotides having a formula 
represented by: 
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25 



(a) 5 



- 3' Restriction site- (NNB) B -Complementary site- 
30 and 

(b) 3' - 5' Complementary site- (NNV) .-Restriction site, 
where n is an integer, such that 10 s n < 100 and m is' 
an integer, such that 10 s m slOO. More particularly, 
the single stranded nucleotides are represented as: 
35 pairs of nucleotide sequences of a first formula 
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5' X (NNB) n J Z 3 ' and a second nucleotide sequence of 
the formula 3' Z' 0 U (NNV) m Y 5' where X and Y are 
restriction enzyme recognition Bites, such that X * Y; 

N is A, C, G or T; 
5 B is 6, T or C? 

V is G, A or C; 

n is an integer, such that 10 s n s 100; 
m is an integer, such that 10 s m * 100; 
Z and Z' are each a sequence of 6, 9 or 12 
10 nucleotides, such that 

Z and Z' are complementary to each other; and 
J is A, C, G, T or nothing; 
O is A, C, G, T or nothing; and 
U is G, A, C or nothing; provided, however, if 
15 any one of J, O or U is nothing then J, 0 and U are 
all nothing. 

Any method for synthesis of the single 
stranded sets of nucleotides is suitable, including 
use of an automatic nucleotide synthesizer. The 
20 synthesizer can be programmed so that the nucleotides 
can be incorporated, either in equimolar or non- 
equimolar ratios at the variant positions, i.e., N, B, 
V, J, O or U. The nucleotide sequences of the desired 
length are purified, for example, by HPLC. 
25 Pairs of the purified, single stranded 

nucleotides of the desired length are reacted together 
in appropriate buffers through repetitive cycles of 
annealing and DNA synthesis using an appropriate DNA 
polymerase, such as Taq, Vent m or Bst DNA polymerase, 
30 and appropriate temperature cycling. Sequenase" 

(modified T7 DNA polymerase) is preferred for use, and 
can be employed according to the instructions of the 
manufacturer (U.S. Biochemical, Cleveland, OH), 
without temperature cycling. Klenow fragment of E. 
35 coli DNA polymerase could be used but, as would be 
understood by those of skill in the art, such 
polymerase would need to be replenished at each cycle 



- 36 - 



WO 96/06188 



PCT/DS95/I0523 



10 



15 



and thus is less preferred. The double stranded DNA 
reaction products, now greater than m + n in length, 
are isolated, for example, by phenol /chloroform 
extraction and precipitation with ethanol . 

After resuspension in buffer, the double 
stranded synthetic oligonucleotides are cleaved with 
appropriate restriction enzymes to yield a plurality 
of synthesized oligonucleotides. The double -stranded 
synthesized oligonucleotides should be selected for 
those of the appropriate size by means of high 
resolution polyacrylamide gel electrophoresis, or 
NuSieve/MetaMorph {FMCCorp., Rockland, MA) agarose 
gel electrophoresis, or the like, size selection of 
the oligonucleotides substantially eliminates abortive 
assembly products of inappropriate size and incomplete 
digestion products. 

This scheme for synthesis and assembly of 
the unpredictable oligonucleotides used to construct 
the TSAR libraries incorporates m + n variant, 
20 unpredicted nucleotide sequences of the formula 

(NNB) n , n where B is G, T or C and n and m are each an 
integer, such that 20 s n + m s 200; thus, from 20 to 
200 unpredicted codons are incorporated into the 
synthesized double stranded oligonucleotides. Such a 
scheme provides a number of important advantages not 
available with conventional libraries. As assembled, 
the synthesized oligonucleotides encode all twenty 
naturally occurring amino acids by use of 48 different 
amino acid encoding codons. Although this uses 
somewhat less variability than that found in nature 
where 64 different codons are used, the scheme 
advantageously provides greater variability than other 
conventional schemes. For example, conventional 
schemes in which the variant nucleotides have the 
35 formula NNK, where K is G or T (see Dower, PCT 

Publication No. wo 91/19818), or the formula NNS, 
where S is C or G (see Devlin, PCT Publication No. 
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WO 91/18980), use only 32 different amino acid- 
encoding codons . The use of a larger number of amino 
acid encoding codons may make the TSAR libraries less 
susceptible to codon preferences of the host when the 
5 libraries are expressed. Although both the TSAR 
scheme and conventional schemes retain only 1 stop 
codon, use of NNB as taught in the TSAR scheme 
advantageously provides synthesized oligonucleotides 
in which the probability of a stop codon is decreased 

10 compared to conventional NNS or NNK schemes. 

Perhaps most significantly, the TSAR scheme 
for synthesis and assembly of the oligonucleotides 
provides sequences of oligonucleotides encoding 
unpredicted amino acid sequences which are larger in 

15 size than conventional libraries. The present 

synthesized double stranded oligonucleotides comprise 
at least about 77-631 nucleotides encoding the 
restriction enzyme sites, the complementary site and 
about 20-200 unpredicted amino acids in the TSAR 

20 binding domain. According to a preferred embodiment, 
n and m are greater than or equal to 10 and less than 
or equal to 50. Thus, the synthesized double stranded 
oligonucleotides comprise at least 77-331 nucleotides 
and encode about 20-100 unpredicted amino acids in the 

25 TSAR binding domain. In specific examples, the 

synthesized oligonucleotides encode 20, 24 and 36 
unpredicted amino acids and 27, 35 and 42 total amino 
acids, respectively for the TSAR-9, TSAR-12, and 
TSAR- 13 libraries, in the TSAR binding domain. 

30 According to an alternative embodiment, 

syngenes are isolated from a library which expresses a 
plurality of TSAR peptides having some degree of 
conformational rigidity in their structure 
(constrained or semirigid peptide libraries, 

35 illustrated in Figure 1D-F and exemplified by the 

TSAR- 13 and TSAR- 14 libraries described in Section 6.4 
and its subsections* 
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At least four different methods can be used 
to engineer TSAR libraries used to identify syngene 
sequences encoding a binding domain, so that the 
expressed peptides are semirigid or have some degree 
5 of conformational rigidity, m the first method, the 
synthesized oligonucleotides are designed so that the 
expressed peptides have a pair of invariant cysteine 
residues positioned in, or flanking, the unpredicted 
or variant residues (See Figure ID) . When the library 
is expressed in an oxidizing environment, the cysteine 
residues should be in the oxidized state, most likely 
cross-linked by disulfide bonds to form cystines. 
Thus, the peptides would form rigid or semirigid 
loops. The nucleotides encoding the cysteine residues 
15 should be placed from 6 to 27 amino acids apart 
flanking the variant nucleotide sequences. A 
particular peptide library having such structure is 
the R8C library described in Section 6.13 infra. 

The actual positions of the invariant 
residues can be modeled on the arrangement observed in 
isolates from a linear peptide library, for example. 
TSAR peptides in which two or four cysteines are 
encoded by the inserted synthesized oligonucleotides, 
isolated from the TSAR- 9 or TSAR-12 libraries (see 
Section 6.9 and its subsections infra). The following 
general formulas illustrate the structure of these 
peptides: 

(1) X(NNB) S (TGC) (NNB) jjZ (NNB) l4 (TGC) (NNB) 3 Y 

(2) X(NNB) l (TGC) (NNB) „ (TGC) 2 (NNB) 4 Z (NNB) , (TGC 
30 )(NNB),Y; 

(3) X(NNB) xe (TGC) (NNB) >Z (NNB) lt (TGC) (NNB) {Y; 

(4) X (NNB) u (TGC) (NNB) t Z (NNB), (TGC) (NNB) , 0 Y 
The positions of the cysteines are well 

tolerated as these phage are stable and infectious. 
35 In the second method, a double stranded 

oligonucleotide sequence providing a cloverleaf 
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structure (see Figure IE) can be represented, for 
example, by the formula: 

X (TGC) x (NNB) 10 (TGC) x (NNB) 6 Z (NNB) 3 (TGC) x (NNB) 14 (TGC) X Y . 
When these peptides are express d by the appropriate 
5 vectors, the cysteine residues may adopt three 
different disulfide bond arrangements, thereby 
generating three different patterns of "cloverleaf s" . 
The plurality of proteins, polypeptides and/or 
peptides expressed by this type of rigid library 

10 should form many different ligand binding pockets from 
which to select the best fit. It should be noted that 
when a semirigid library of the first or second type 
above is expressed in a viral vector in an oxidizing 
environment, there will likely be a selection against 

15 odd numbers of cysteines occurring within the 

unpredicted or random peptide regions expressed 
because one unpaired cysteine residue will likely 
cross- link the viral vectors and make them non- 
infectious. Examples of semi-rigid libraries 

20 providing a cloverleaf structure are the TSAR- 13 and 
TSAR- 14 libraries described in Sections 6.9.4 and 
6.9.4.1 infra . 

In the third method, the synthesized 
nucleotides are designed and assembled so that the 

25 plurality of proteins expressed have both invariant 

cysteine and histidine residues positioned within the 
variant nucleotide sequences (see Figure IF) . The 
positions of the invariant residues can be modeled 
after the arrangement of cysteine and histidine 

30 * residues seen in zinc finger proteins (i.e., -CX 2 _ 
4 CX 12 HX 3 _ 4 H- , where X is any amino acid), thereby 
creating a library of zinc finger-like proteins. As 
used herein the term "zinc finger-like proteins" is 
intended to mean any of the plurality of proteins 

35 expressed which contain invariant cysteine and 

histidine residues which confer a zinc finger or 
similar structure on the expressed protein. 
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In the fourth method, (see Figure IF) , the 
plurality of proteins are designed to have invariant 
histidine residues positioned within the variant 
nucleotide sequences. To illustrate, exemplary 
histidine containing TSARs can be represented by the 
following general formulas: 

(1) X(NNB) 4 (CAC) (NNB) 4 (CAC) (NNB) ,Z (NNB) ( (CAC) (NNB), (CAC 
) a (NNB) Y; 

X (NNB) , (CAC) (NNB), (CAC) (NNB) Z (CAC) (NNB) « (CAC) a (NNB) 
10 6 (CAC) (NNB) (CAC) (NNB) a Y; 

(3) X (NNB), (CAC) (NNB) , x (CAC) , (NNB) (CAC) (NNB) 2 Z (NNB) , (CAC 
) (NNB) 5 (CAC) 2 (NNB),Y; and 

(4) X(CAC) (NNB), (CAC) (NNB), (CAC) (NNB) 2 (CAC) (NNB) Z (CAC) 
(NNB) t (CAC) (NNB) 4 (CAC) (NNB) (CAC) (NNB) 3 Y, where CAC 

15 represents the codon for histidine. 

To maintain the rigid cloverleaf 
conformation of this plurality of proteins, the TSAR 
proteins are expressed and harvested in the presence 
of l-iooo „M zinc chloride. The expressed proteins 
20 could also be saturated with other divalent metal 
cations, such as Cu s * and Ni»*. The members of this 
type of rigid library may have advantageous chemical 
reactivity, since metal ions are often within the 
catalytic sites of enzymes. 

To prepare a libary of syngenes encoding 
semi-rigid or constrained peptides, the synthesized 
single stranded nucleotides are assembled by annealing 
a first nucleotide sequence of the formula: 

30 5'X tor (NNB).] C JZ 3' with a second nucleotide 
sequence of the formula 

3' Z'OU I(NNV) b 0J d Y 5' 

where a, c, b, d are integers such that 20 s [a] e + 
35 [b] d s 200; and 

c and d are each * 1; 
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a is an invariant nucleotide sequence that 
confers some structure in the peptide it encodes 
and & is an invariant nucleotide sequence whose 
complimentary nucleotide sequence confers some 
5 structure in the encoded peptide; and 

X, Y, N, B, V, Z, Z' , J, O, U are as defined 
above. 

This scheme for synthesis of unpredictable 
oligonucleotides incorporates a total of the 

10 arithmetic sum of (axe) + (b x d) , i.e., [a] c +[b] d 
variant, unpredicted nucleotide sequences, i.e., 
[(NNB)J C + [<NNV) b ] d , flanked by invariant nucleotides, 
i.e., of and 0, which encode structure-conferring amino t 
acid sequences. 

15 By way of example, or and 0 could include a 

codon for one or more cysteine residues, for example 
Gly-Cys-Gly, in which instance a and b are each 
preferably *6 and <27, to generate disulfide bonds 
between different cysteines in the expressed loop 

20 forming peptide structures. Appropriate 

oligonucleotides could be assembled, by annealing for 
example, a first nucleotide sequence of the formula: 
5' X a (NNB) a or (NNB) a Z 3' with a second nucleotide 
sequence of the formula 

25 3' Z' (NNV) b 0 Y 5' . 

More particularly, where a encodes the sequence Gly- 
Cys-Gly (and the complementary sequence of 0 encodes 
the same sequence, and where both a and b are equal to 
seven, the synthesized single stranded nucleotides are 

30 assembled by annealing a first nucleotide sequence: 
5' X(GGG) (TGT) (GGG) (NNB) 7 (GGG) (TGT) (GGG) (NNB) 7 {GGG) 
(TGT) (GGG) 3' with a second nucleotide sequence: 
3' (CCC) (ACA) (CCC) (NNV) 7 (CCC) (ACA) (CCC)Y 5' 
where GGG represents the codon for glycine and TGT 

35 represents the codon for cysteine. This 

oligonucleotide scheme encodes peptides, whose amino 
acid sequence would be GCGX 7 GCGX 7 GCGX 7 GCG . 
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Alternatively, or and /S could encode one or 
more histidine residues, for example GHGHG (SEQ ID NO: 
54) • In yet another alternative embodiment, a and $ 
could encode a Leu residue in which instance a and b 
5 are each s about 7. Such alternative embodiment would 
provide an alpha helical structure in the expressed 
peptides . 

Additionally, according to yet another 
alternative embodiment, an o group could be used for 
10 Z, and £ for Z' to provide the complementary sequences 
to aid in annealing the nucleotides. 

Other nucleotide sequences encoding amino 
acids that will impose structural constraints on the 
expressed peptides are possible as would become 

15 apparent to one of skill in the art based on the above 
description and syngenes encoding such constrained 
peptides are encompassed within the scope of the 
present invention. 

An additional feature of these semirigid 

20 libraries is the potential to control the binding 
properties of isolates by reversibly destroying or 
altering the rigidity of the peptide. For example, it 
should be possible to elute a TSAR bound to a 
particular ligand in a gentle manner with reducing 

25 agents (i.e., DTT, /8-mercaptoethanol) or divalent 

cation chelators (i.e., EDTA, EGTA) . Such reagents 
can be used, for example, to elute a TSAR library 
expressed on phage vectors from target ligands. EDTA 
or EGTA, at low concentrations, does not appear to 

30 disrupt phage integrity or infectivity. 

Once the phage have been recovered and it is 
deemed necessary to remove thiols from the solution, 
the reduced cysteine residues can be alkylated with 
iodoacetamide. This treatment prevents renewed 

35 disulfide bond formation and only diminishes phage 
infectivity 10-100 fold, which is tolerable since 
phage cultures usually attain titers of 10 12 ' plaque 
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forming units per milliliter- Alternatively, the 
elution reagents can be removed by dialysis (i.e., 
dialysis bag, Cent ricon /Ami con microconc ntrators) . 

5 

5.1.3.2. EXPRESSION OF VECTO RS ENCODING TSARS 

In constructing a TSAR library, the 
plurality of oligonucleotides of appropriate size 
prepared as described above is inserted into an 

10 appropriate vector which, when inserted into a 

suitable host, expresses the plurality of peptides as 
heterof unctional fusion proteins with an expressed 
component of the vector (effector domain) which are 
screened to identify TSARs having affinity for a 

15 ligand of choice. According to an optional 

embodiment, the plurality of peptides further comprise 
a linking domain between the binding and effector 
domains. In a preferred mode of this embodiment, the 
linker domain is expressed as a fusion protein with 

20 the effector domain of the vector into which the 
plurality of oligonucleotides are inserted. 

The skilled artisan will recognize that to 
achieve transcription and translation of the plurality 
of TSAR encoding oligonucleotides, the synthetic 

25 oligonucleotides must be placed under the control of a 
promoter compatible with the chosen vector-host 
system. A promoter is a region of DNA at which RNA 
polymerase attaches and initiates transcription. The 
promoter selected may be any one that has been 

30 synthesized or isolated that is functional in the 

vector-host system. For example, E. coli, a commonly 
used host system, has numerous promoters such as the 
lac or trp promoter or the promoters of its 
bacteriophages or its plasmids. Also synthetic or 

35 recombinantly produced promoters such as the p^ 

promoter may be used to direct high level expression 
of the gene segments adjacent to them. 
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Signals are also necessary in order to 
attain efficient translation of the inserted 
oligonucleotides. For example in E. coli mRNA, a 
ribosome binding sit includes the translational start 
5 codon AUG or GUG in addition to other sequences 
complementary to the bases of the 3 1 end of 165 
ribosomal RNA. Several of these latter sequences such 
as the Shine /Dalgarno (S/D) sequence have been 
identified in E. coli and other suitable host cell 
10 types. Any S/D-ATG sequence which is compatible with 
the host cell system can be employed. These S/D-ATG 
sequences include, but are not limited to, the S/D-ATG 
sequences of the cro gene or N gene of bacteriophage 
lambda, the tryptophan E, D, C, B or A genes, a 
15 synthetic S/D sequence or other S/D-ATG sequences 

known and used in the art. Thus, regulatory elements 
control the expression of the polypeptide or proteins 
to allow directed synthesis of the reagents in cells 
and to prevent constitutive synthesis of products 
20 which might be toxic to host cells and thereby 
interfere with cell growth. 

Any of a variety of vectors can be used, 
including, but not limited to bacteriophage vectors 
such as 0X174, X, M13 and its derivatives, fl, fd, 
25 Pfi, etc., phagemid vectors, plasmid vectors, insect 
viruses, such as baculovirus vectors, mammalian cell 
vectors, including such virus vectors as parvovirus 
vectors, adenovirus vectors, vaccinia virus vectors, 
retrovirus vectors, etc., yeast vectors such as Tyl, 
30 killer particles, etc. 

An appropriate vector contains or is 
engineered to contain a gene encoding an effector 
domain of a TSAR comprising the pill protein or other 
suitable protein or portion thereof providing for 
35 phage display. The effector domain gene contains or 
is engineered to contain multiple cloning sites. At 
least two different restriction enzyme sites within 
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such gene, comprising a polyl inker, are preferred. 
Xhe vector DNA is cleaved within the polylinker using 
two different restriction enzymes to generate termini 
complementary to the termini of the double stranded 
5 synthesized oligonucleotides assembled as described 
above. Preferably the vector termini after cleavage 
have or are modified, using DNA polymerase, to have 
non- compatible sticky ends that do not self-ligate, 
thus favoring insertion of the double -stranded 

10 synthesized oligonucleotides and hence formation of 
recombinants expressing the TSAR fusion proteins, 
polypeptides and/or peptides. The double stranded 
synthesized oligonucleotides are ligated to the 
appropriately cleaved vector using DNA ligase. 

15 It is particularly useful to include a 

"stuff er fragment" within the polylinker region of the 
vector when the vector (e.g. phage or plasmid) is 
intended to express the TSAR as a heterof unctional 
fusion protein that is expressed on the surface of the 

20 vector. As used in the present application, a 
" stuff er fragment** is intended to encompass a 
relatively short, i.e., about 24-45 nucleotides, known 
DNA sequence flanked by at least 2 restriction enzyme 
sites, useful for cloning, said DNA sequences coding 

25 for a binding site recognized by a known ligand, such 
as an epitope of a known monoclonal antibody. The 
restriction enzyme sites at the termini of the stuffer 
fragment are useful for insertion of the synthesized 
double stranded oligonucleotides, resulting in 

30 deletion of the stuffer fragment. 

Because of the physical linkage between the 
expressed heterologous fusion protein and the phage or 
plasmid vector containing the stuffer fragment and 
because the stuffer fragment can comprise a known DNA 

35 sequence encoding a protein that is immunologically 
active (i.e., an immunological marker), the presence 
or absence of the stuffer fragment can be easily 
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detected either at the nucleotide level, by DNA 
sequencing, PCR or hybridization, or at the amino acid 
level, e.g., using an immunological assay. Such 
determination allows rapid discrimination between 
5 recombinant (TSAR expressing) vectors generated by 
insertion of the synthesized double stranded 
oligonucleotides and non- recombinant vectors. 

According to a preferred embodiment . the 
stuffer fragment comprises the DNA fragment encoding 
10 the epitope of the human c-myc protein recognized by 
the murine monoclonal antibody 9E10 having the amino 
acid sequence EQKLISEEDLN (SEQ ID NO: 55) (Evan et 
al., 1985, Mol. Cell. Biol. 5:3610-3616) with a short 
flanking sequence of amino acids at the 5' and 3' 
15 termini which serve as restriction enzyme sites. so 
that the stuffer fragment can be removed and the 
synthesized double stranded oligonucleotides can be 
inserted using the restriction sites. 

In another aspect, the stuffer fragment 
provides an efficient means to remove any non- 
recombinant vectors to enhance or enrich the 
population of TSAR expressing vectors, if necessary. 
Because the stuffer fragment is expressed e.g., as an 
immunologically active surface protein on the surface 
25 of non-recombinant vectors, it provides an accessible 
target for binding e.g., to an immobilized antibody. 
The non-recombinants thus could be easily removed from 
a library for example by serial passage over a column 
having the antibody immobilized thereon to enrich the 
30 population of recombinant TSAR- expressing vectors in 
the library. 

In a preferred embodiment, the vector 
providing for expression of the TSAR libraries is or 
is derived from a filamentous bacteriophage, including 
35 but not limited to M13, fl, fd, Pfl, etc. vector 
encoding a phage structural protein, preferably a 
phage coat protein, such as pi II, pVIII, etc. In a 
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more preferred embodiment, the filamentous phage is an 
M13 -derived phage vector such as m655, m663, and m666 
described in Fowlkes et al., 1992, BioTechniques, 
13:422-427 which encod s the structural coat protein 
5 pill. 

The phage vector is chosen to contain or is 
constructed to contain a cloning site located in the 
5 ' region of a gene encoding a bacteriophage 
structural protein so that the plurality of 

10 synthesized double stranded oligonucleotides inserted 
are expressed as fusion proteins on the surface of the 
bacteriophage. This advantageously provides not only 
a plurality of accessible expressed peptides but also 
provides a physical link between the peptides and the 

15 inserted oligonucleotides to provide for easy 

screening and sequencing of the identified TSARs. 
Alternatively, the vector is chosen to contain or is 
constructed to contain a cloning site near the 3 ' 
region of a gene encoding structural protein so that 

20 the plurality of expressed proteins constitute C- 
terminal fusion proteins. 

According to a preferred embodiment, the 
structural bacteriophage protein is pill. The m663 
vector described by Fowlkes et al. (1992, 

25 BioTechniques 13:422-427), containing the pill gene 
having a c-myc-epitope (comprising the "stuffer 
fragment") introduced at the N-terminal end, flanked 
by Xho I and Xba I restriction sites may be used.. The 
library may be constructed by cloning the plurality of 

30 synthesized oligonucleotides into a cloning site near 
the N- terminus of the mature coat protein of the 
appropriate vector, preferably the pill protein, so 
that the oligonucleotides are expressed as coat 
protein- fusion proteins. 

35 Alternatively, the plurality of 

oligonucleotides is inserted into a phagemid vector. 
Phagemids are utilized in combination with a defective 
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helper phage to supply missing viral proteins and 
replicative functions. Helper phage useful for 
propagation of M13 derived phagemids as viral 
particles include but are not limited to M13 phage 
5 K07, R408, VCS, etc. Generally, according to a 

preferred mode of this embodiment, the appropriate 
phagemid vector is constructed by engineering the 
Bluescript II SK+ vector (GenBank #52328) (Aiting-Mees 
et al., 1989, Nucl . Acid Res. 17:9494); to contain (1) 
10 a truncated portion of the M13 pill gene, i.e., 

nucleotides encoding amino acid residues 198-406 of 
the mature pill, (2) the PelB signal leading with an 
upstream ribosome binding site and a short polylinker 
of Pst I, Xho I, Hind III, and Xba 1 restriction 
15 sites, in which the Xho I and Xba I sites are 
positioned so the synthesized double stranded 
oligonucleotides could be cloned and expressed in the 
same reading frame as the m663 phage vector; and (3) 
the linker sequence encoding GGGGS {SEQ ID NO: 56) 
between the polylinker and the pi I I gene. 

Alternatively, the synthesized 
oligonucleotides are inserted into a plasmid vector. 
An illustrative suitable plasmid vector for expressing 
the TSAR libraries is a derivative of plasmid p340-l 
25 (ATCC No. 40516) . 

In order to obtain the appropriate p340-l 
derivative suitable as an expression vector, the Nco I 
- Bam HI fragment is removed from p340-l plasmid and 
replaced by a double stranded sequence having Xho I 
and Xba I restriction sites in the correct reading 
frame. In practice, p340-l is cleaved using 
restriction enzymes at the Bgl II and Xba I sites and 
annealed with two oligonucleotides: 

(1) 5 ' - CATGGCTCGAGGCTGAGTTCTAGA- 3 ' {SEQ ID NO: 57) and 
35 (2) 5 ' -GATCTCTAGAACTCAGCCTCGAGC-3 ' (SEQ ID NO: 58) 

having Nco I and Bam HI sticky ends. After ligation 
and transformation of £. coli, recombinants containing 
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the desired plasmid d signated p340-lD are selected 
based on the inserted SEQ ID NOS: 57 and 58 and 
verified by sequencing. Like the parent p340-l, the 
desired p340-lD does not produce functional 
5 galactosidase because this gene is out of frame. 
Thus, when the synthesized double stranded 
oligonucleotides are inserted, using the Xho I and Xba 
I restriction sites, into the p340-lD vector the 
coding frame is restored and the TSAR binding domain 

10 is expressed as a fusion protein with the 0- 

galactosidase. When exposed to IPTG, the vectors 
expressing the TSAR library would produce identifiable 
blue colonies. 

Another illustrative plasmid vector useful 

15 to express a TSAR library is a plasmid derivative of 
plasmid pTrc99A (Amann et al., 1988, Gene 69:301-315) 
(Pharmacia , Piscataway, NJ) designated plasmid pLamB 
which is constructed to contain the LamB protein gene 
of E. coli (Clement and Hofnung, 1981, Cell 

20 27:507-514) having a cloning site so that the 

plurality of oligonucleotides inserted are expressed 
as fusion proteins of the LamB protein. 

Once the appropriate expression vectors are 
prepared, they are inserted into an appropriate host, 

25 such as E. coli, Bacillus subtilis, insect cells, 
mammalian cells, yeast cells, etc., for example by 
electroporation, and the plurality of oligonucleotides 
is expressed by culturing the transfected host cells 
under appropriate culture conditions for colony or 

30 phage production. Preferably, the host cells are 
protease deficient, and may or may not carry 
suppressor tRNA genes . 

A small aliquot of the electroporated cells 
are plated and the number of colonies or plaques are 

35 counted to determine the number of recombinants. The 
library of recombinant vectors in host cells is plated 
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at high density for a single amplification of the 
recombinant vectors. 

For example, recombinant M13 v ctor mS€6, 
m655 or m663, engineered to contain the synthesized 
5 double stranded oligonucleotides, is transfected into 
DH5aF ' E. coli cells by electroporation. TSARs are 
expressed on the outer surface of the viral capsid 
extruded from the host E. coli cells and are 
accessible for screening. The parent m666, m655 or 
10 m663 vectors contain the c-myc epitope (stuffer 
fragment) . When the double stranded synthesized 
oligonucleotides are inserted between the Xho I and 
Xba I sites, the stuffer fragment is removed. The 
cloning efficiency of the expressed library is easily 
15 determined by filter blotting with the 9E10 antibody 
that recognizes the c-myc epitope. 

Alternatively, when the double stranded 
synthesized oligonucleotides are cloned just at the 
Xho I or Xba I site, the c-myc epitope is retained. 
Then the c-myc epitope is expressed in the pill-fusion 
protein expressed by the vector. An advantage of the 
m663 vector is that it contains an intact LacZ* gene, 
which can be easily seen as a blue dot when expressed 
in E. coli plated on X-gal and IPTG. 
25 TSARs can be expressed in a plasmid vector 

contained in bacterial host cells such as E. coli. 
The TSAR proteins accumulate inside the E. coli cells 
and a cell lysate is prepared for screening. Use of 
plasmid p340-lD is described as an illustrative 
30 example. A TSAR library in p340-lD as described 

above, expressed the co- functional fusion protein with 
0-galactosidase. In the parent vector (without 
synthetic oligonucleotide) the 0-galactosidase gene is 
out of frame and therefore nonfunctional. When plated 
35 on LB plates with ampicillin, IPTG and Xgal, the 

colonies that have TSAR oligonucleotides yield blue 
colonies, whereas colonies harboring non- recombinant 
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P340-1D or p340-lD r combinants with oligonucleotides 
carrying unsuppresBed stop codons will be white. The 
r lative number of blue and white colonies reveals the 
percent recombinants, and is useful in estimating th 
5 total numbers of recombinants in the library, and is 
also useful in screening. 

Phagemid vectors containing the synthesized 
double stranded oligonucleotides -, expressed on the 
outer surface of the extruded phage, are propagated 

10 either as infected bacteria or as bacteriophage with 
helper phage. 

The expressed pDAF2-3 phagemids (See Section 
6.10 and its subsections) have the added advantage 
that they include the c-myc gene which can serve as an 

15 "epitope tag" for the fusion pill proteins. 

Approximately 0.1-10% of the phage carrying the 
phagemid genome incorporate the fusion pill molecule. 
The intactness of the chimeric pill proteins is 
evaluated based on the expression of the c-myc 

20 epitope. By following the expression of the c-myc 
epitope using the 9E10 antibody, it is possible to 
monitor the successful incorporation of the fusion 
pill molecule into the M13 viral particle* 

Also when expressing pDAF2, if the upstream 

25 c-myc peptide is detected immunologically using the 
9E10 antibody, then it can be assumed that the 
downstream synthesized oligonucleotide, expressed TSAR 
peptide is appropriately expressed. 

In addition, it may be of value to 

30 electroporate several different strains of E. coli and 
establish different versions of the same library. Of 
course, the same E. coli strain would need to be used 
for the entire set of screening experiments. This 
strategy is based on the consideration that there is 

35 likely an in vivo biological selection, both positive 
and negative, on the viral assembly, secretion, and 
infectivity rate of individual M13 recombinants due to 
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the sequence nature of the peptide-plll fusion 
proteins. Therefore, E. coli with different genotypes 
(i.e., chaperonin overexpressing, or secretion 
enhanced) will serv as bacterial hosts, because they 
will yield libraries that differ in subtle, 
unpredictable ways. 

5.2. SCREENING OF PEPTIDE LIBRARIES 
TO IDENTIFY BINDING DOMAINS 
ENCODED B Y SYNGENES 



The desired random peptide library is 
screened to identify and recover a syngene encoding a 
binding domain that binds to a ligand of choice. 

As used in the present invention, a ligand 
is a substance for which it is desired to isolate a 
specific binding partner from a synthetic random 
peptide library. The term "ligand" is thus intended 
to include but not be limited to a substance, 
including a molecule or portion thereof, for which a 
proteinaceous receptor naturally exists or can be 
prepared according to the method of the invention. 
For example, a binding domain which binds to a ligand 
can function as a receptor, i.e., a lock into which 
the ligand fits and binds; or a binding domain can 
function as a key which fits into and binds a ligand 
when the ligand is a larger protein molecule. In this 
invention, a ligand includes, but is not limited to, a 
non- ionic chemical group, an organic chemical group, 
an ion, a metal, a metal or non-metal inorganic ion, a 
glycoprotein, a protein, a polypeptide, a peptide, a 
nucleic acid, a carbohydrate or carbohydrate polymer, 
a lipid, a fatty acid, a viral particle, a membrane 
vesicle, a cell wall component, a synthetic organic 
compound, a bioorganic compound and an inorganic 
compound or any portion of any of the above. Ligands 
also include the variable region of an antibody, an 
enzyme/substrate binding site, an enzyme/co-f actor 
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binding site, a regulatory DNA binding protein, an RNA 
binding protein, a binding site of a metal binding 
protein, a nucleotid fold or GTP binding protein, a 
calcium binding protein, a membrane protein, a viral 
5 protein and an integrin. In another embodiment, the 
ligand is a peptide that is an intracellular targeting 
or processing signal (e.g., nuclear localization 
signals), e.g., whereby the syngene- encoded binding 
peptide which recognizes it will interfere with proper 

10 targeting of in vivo proteins containing such a 
targeting signal . 

A preferred method for identifying syngenes 
that encode a binding domain that binds to a ligand of 
choice comprises screening a library of recombinant 

15 vectors that express a plurality of heterofunctional 

fusion proteins, said fusion proteins comprising (a) a 
binding domain encoded by an oligonucleotide 
comprising unpredictable nucleotides in which the 
unpredictable nucleotides are arranged in one or more 

20 contiguous sequences, wherein the total number of 

unpredictable nucleotides is greater than or equal to 
about 60 and less than or equal to about 600, and 
optionally, (b) an effector domain encoded by an 
oligonucleotide sequence which is a protein or peptide 

25 that enhances expression or detection of the binding 

domain. Screening is done by contacting the plurality 
of heterofunctional fusion proteins with the ligand of 
choice under conditions conducive to ligand binding 
and then isolating the fusion proteins which bind to 

30 the ligand. The methods of the invention further 
preferably comprise determining the nucleotide 
sequence encoding the binding domain of the 
heterofunctional fusion protein identified to 
determine the syngene sequence that encodes the 

35 binding domain and simultaneously to deduce the amino 
acid sequence of the binding domain. Nucleotide 
sequence analysis can be carried out by any method 
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known in the art, including but not limited to the 
method of Maxam and Gilbert (1980 , Meth. Enzymol. 
65:499-560), the Sanger dideoxy method (Sanger et al., 
1977, Proc. Natl, Acad. Sci. U.S.A. 74:5463), the use 
5 of T7 DNA polymerase (Tabor and Richardson, U.S. 
Patent No. 4,795,699; Sequenase™, U.S. Biochemical 
Corp.), or Taq polymerase, or use of an automated DNA 
sequenator (e.gr. , Applied Biosystems, Foster City, 
CA) . 

10 Alternatively, syngenes encoding binding 

domains are identified by a method comprising 
identifying their encoded binding domain protein 
(and/or peptide) which binds to a ligand of choice, 
comprising: (a) generating a library of vectors 

15 expressing a plurality of heterofunctional fusion 

proteins comprising (i) a binding domain encoded by a 
double stranded oligonucleotide comprising 
unpredictable nucleotides in which the unpredictable 
nucleotides are arranged in one or more contiguous 

20 sequences, wherein the total number of unpredictable 
nucleotides is greater than or equal to about 60 and 
less than or equal to about 600 and the contiguous 
sequences are flanked by invariant residues designed 
to encode amino acids that confer a desired structure 

25 to the binding domain of the expressed 

heterofunctional fusion protein, and, optionally, (ii) 
an effector domain encoded by an oligonucleotide 
sequence encoding a protein or peptide that enhances 
expression or detection of the binding domain; and (b) 

30 screening the library of vectors by contacting the 

plurality of heterofunctional fusion proteins with the 
ligand of choice under conditions conducive to ligand 
binding and isolating the heterofunctional fusion 
protein which binds to the ligand. Additionally, the 

35 methods of the invention further comprise determining 
the nucleotide sequence encoding the binding domain of 
the heterofunctional fusion protein identified to 
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deduce the amino acid sequence of the binding domain. 
Most important, however, the present invention 
determines the nucleotide sequence encoding the 
binding domain of the he tero functional fusion protein 
5 identified to identify the syngene to encode said 
binding domain. 

Once a suitable random peptide library has 
been constructed (or otherwise obtained) , the library 
is screened to identify peptides having binding 

10 affinity for a ligand of choice. Screening the 

libraries can be accomplished by any of a variety of 
methods known to those of skill in the art. See, 
e.g., the following references, which disclose 
screening of peptide libraries: Parmley and Smith, 

15 1989, Adv. Exp. Med. Biol. 251:215-218; Scott and 
Smith, 1990, Science 249 : 386-390; Fowlkes et al., 
1992; BioTechniques 13:422-427; Oldenburg et al., 
1992, Proc. Natl. Acad. Sci. USA 89:5393-5397; Yu et 
al., 1994, Cell 76:933-945; Staudt et al., 1988, 

20 Science 241:577-580; Bock et al., 1992, Nature 

355:564-566; Tuerk et al., 1992, Proc. Natl. Acad. 
Sci. USA 89:6988-6992; Ellington et al., 1992, Nature 
355:850-852; U.S. Patent No. 5,096,815, U.S. Patent 
No. 5,223,409, and U.S. Patent No. 5,198,346, all to 

25 Ladner et al . ; and Rebar and Pabo, 1993, Science 

263:671-673. 

If the libraries are expressed as fusion 
proteins with a cell surface molecule, then screening 
is advantageously achieved by contacting the vectors 

30 with an immobilized target ligand and harvesting those 
vectors that bind to said ligand. Such useful 
screening methods, designated "panning" techniques are 
described in Fowlkes et al., 1992, BioTechniques 
13:422-427. In panning methods useful to screen the 

35 libraries, the target ligand can be immobilized on 
plates, beads, such as magnetic beads, sepharose, 
etc., or on beads used in columns. In particular 
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embodiments, the immobilized target ligand can be 
"tagged", e.g., using such as biotin, 2-f luorochrome, 
e.g. for FACS sorting. 

In one embodiment, presented by way of 
5 example but not limitation, screening a library of 

phage expressing random peptides on phage and phagemid 
vectors can be achieved as follows using magnetic 
beads. Target ligands are conjugated to magnetic 
beads, according to the instructions of the 
10 manufacturers- To block non-specific binding to the 
beads, and any unreacted groups, the beads are 
incubated with excess bovine serum albumin (BSA) . The 
beads are then washed with numerous cycles of 
suspension in phosphate buffered saline (PBS; 137 mM 
15 NaCl, 2.7 mM KC1, 4.3 mM Na 2 HP0 4 -7H 2 0, 1.4 mM KH 2 P0 4 , pH 
7.3) with 0.05% Tween • 20 and recovered with a strong 
magnet along the sides of a plastic tube. The beads 
are then stored with refrigeration, until needed. 

In the screening experiments, an aliquot of 
20 a library is mixed with a sample of resuspended beads. 
The tube contents are tumbled at 4°C for 1-2 hrs. The 
magnetic beads are then recovered with a strong magnet 
and the liquid is removed by aspiration. The beads 
are then washed by adding PBS- 0.05% Tween ® 20, 
25 inverting the tube several times to resuspend the 
beads, and then drawing the beads to the tube wall 
with the magnet. The contents are then removed and 
washing is repeated 5-10 additional times. 50 mM 
glycine-HCl (pH 2.0), 100 fxg/ml BSA solution are added 
10 to the washed beads to denature proteins and release 

bound phage. After a short incubation time, the beads 
are pulled to the side of the tubes with a strong 
magnet and the liquid contents are then transferred to 
clean tubes. 1 M Tris-HCl (pH 7.5) or 1 M NaH a PO, (pH 
15 7) is added to the tubes to neutralize the pH of the 
phage sample. The phage are then diluted, e.g., 10* 3 
to 10' 6 , and aliquots plated with E. coli DHSorF ' cells 
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to determine the number of plaque forming units of the 
sample. In certain cases, th platings are done in 
the presence of XGal and IPTG for color discrimination 
of plaqu s (i.e., Iac2+ plaques are blue, 2acZ- 
5 plaques are white) . The titer of the input samples is 
also determined for comparison (dilutions are 
generally 10** to 10'*) . 

Alternatively, as yet another non- limiting 
example, screening a library of phage expressing 

10 random peptides can be achieved by panning using 

microtiter plates. Target ligand is diluted, e.g., in 
100 mM NaHCOj, pH 8.5 and a small aliquot of ligand 
solution is adsorbed onto wells of microtiter plates 
(e.g. by incubation overnight at 4°C) . An aliquot of 

15 BSA solution (1 mg/ml, in 100 mM NaHC0 3 , pH 8.5) is 

added and the plate incubated at room temperature for 
1 hr. The contents of the microtiter plate are 
flicked out and the wells washed carefully with PBS- 
0.05% Tween • 20. The plates are washed free of 

20 unbound targets repeatedly. A small aliquot of phage 
solution is introduced into each well and the wells 
are incubated at room temperature for 1-2 hrs. The 
contents of microtiter plates are flicked out and 
washed repeatedly. The plates are incubated with wash 

25 solution in each well for 20 minutes at room 
temperature to allow bound phage with rapid 
dissociation constants to be released. The wells are 
then washed five more times to remove all unbound 
phage . 

30 In a preferred method for recovering the 

phage bound to the wells, a pH change is used. An 
aliquot of 50 mM glycine-HCl (pH 2.0), 100 Mg/ml BSA 
solution is added to the washed wells to denature 
proteins and release bound phage. After 10 minutes at 

35 65°C, the contents are then transferred into clean 

tubes, and a small aliquot of 1 M Tris-HCl (pH 7.5) or 
1M NaH 2 P0 4 (pH 7) is added to neutralize the pH of the 
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phage sample. The phage are then diluted, e.g., lO" 1 
to 10' 4 and aliquots plated with E. coli DH5aF ' cells 
to determine the number of the plaque forming units of 
the sample. • In certain cases, the platings are done 
in the presence of XGal and IPTG for color 
discrimination of plaques (i.e., IacZ+ plaques are 
blue, JacZ- plaques are white) . The titer of the 
input samples is also determined for comparison 
(dilutions are generally lO"* to 10" 9 ) . Alternatively, 
to recover bound phage, a large volume, approximately 
100 nl. of LB+ ampicillin is added to each well and 
the plate is incubated at 37°C for 2 hr. The bound 
cells undergo cell division in the rich culture medium 
and the daughter cells detach from the immobilized 
15 targets. The contents of the wells are then 

transferred to a culture flask that contains -10 ml 
LB + ampicillin. When the cells are at log-phase, 
inducer is added again to the culture to generate more 
of the encoded proteins. These cells are then 
20 harvested by centrifugation and rescreened. 

By way of another example, the libraries 
expressing random peptides as a surface protein of 
either a vector or a host cell, e.g., phage or 
bacterial cell, can be screened by passing a solution 
25 of the library over a column, of a ligand immobilized 
to a solid matrix, such as sepharose, silica, etc., 
and recovering those phage that bind to the column 
after extensive washing and elution. 

By way of yet another example, weak binding 
30 library members can be isolated based on retarded 

chromatographic properties. According to one mode of 
this embodiment for screening, fractions are collected 
as they come off the column, saving the trailing 
fractions (i.e., those members that are retarded in 
35 mobility relative to the peak fraction are saved) . 
These members are then concentrated and passed over 
the column a second time, again saving the retarded 
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fractions. Through successive rounds of 
chromatography, it is possible to isolate those that 
have some affinity, albeit weak, to the immobilized 
ligand. These library members are retarded in their 
5 mobility because of the millions of possible ligand 

interactions as the member passes down the column. In 
addition, this methodology selects those members that 
have modest affinity to the target, and which also 
have a rapid dissociation time. If desired, the 
10 oligonucleotides encoding the binding domain selected 
in this manner can be mutagenized, expressed and 
rechromatographed (or screened by another method) to 
discover improved binding activity. 

According to another example, 
15 homobi functional (e.g., DSP, DST, BSOCOES, EGS, DMS) 
or heterobifunctional (e.g., SPDP) cross -linking 
agents can be used in combination with any of the 
above methods, to promote capture of weak binding 
members; these cross-linkers should be reversible, 
20 with a treatment (i.e., exposure to thiols, base, 

periodate, hydroxy lamine) gentle enough not to disrupt 
members structure or infectivity, to allow recovery of 
the library member. The elution reagents can be 
removed by dialysis (i.e., dialysis bag, 
25 Centricon/Amicon microconcentrators) . 

According to another alternative method, 
screening a library of can be achieved using a method 
comprising a first "enrichment" step and a second 
filter lift step as follows. 
30 Random peptides from an expressed library 

capable of binding to a given ligand ( "positives " ) are 
initially enriched by one or two cycles of panning or 
affinity chromatography, as described above. The goal 
is to enrich the positives to a frequency of about > 
35 l/10 s . Following enrichment, a filter lift assay is 

conducted. For example, approximately 1-2 x 10 5 phage, 
enriched for binders, are added to 500 j* 1 of lo 9 phase 
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E. coli and plated on a large LB-agarose plate with 
0.7% agarose in broth. The agarose is allowed to 
solidify, and a nitrocellulose filter (e.g., 0.45 n) 
is plac d on the agarose surface . A series of 
registration marks is made with a sterile needle to 
allow re-alignment of the filter and plate following 
development as described below. Phage plaques are 
allowed to develop by overnight incubation at 37 >c 
(the presence of the filter does not inhibit this 
process) . The filter is then removed from the plate 
with phage from each individual plaque adhered in 
situ. The filter is then exposed to a solution of BSA 
or other blocking agent for 1-2 hours to prevent non- 
specific binding of the ligand (or "probe"). 

The probe itself is labeled, for example, 
either by biotinylation (using commercial NHS-biotin) 
or direct enzyme labeling, e.g., with horse radish 
peroxidase (HRP) or alkaline phosphatase. Probes 
labeled in this manner are indefinitely stable and can 
be re-used several times. The blocked filter is 
exposed to a solution of probe for several hours to 
allow the probe to bind in situ to any phage on the 
filter displaying a peptide with significant affinity 
to the probe. The filter is then washed to remove 
unbound probe , and then developed by exposure to 
enzyme substrate solution (in the case of directly 
labeled probe) or further exposed to a solution of 
enzyme-labeled avidin (in the case of biotinylated 
probe) . in a preferred method, an HRP- labeled probe 
is detected by ECL western blotting methods. (Amersham, 
Arlington Heights, IL) , which involves using luminol 
in the presence of phenol to yield enhanced 
chemiluminescence detectable by brief exposure of film 
by autoradiography, in which the exposed areas of film 
correspond to positive plagues on the original plate. 
Where an enzyme substrate is used, positive phage 
plaques are identified by localized deposition of 
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colored enzymatic cleavage product on the filter which 
corresponds to plaques on the original plate- Th 
developed filter or film f as the case may be, is 
simply realigned with the plate using the registration 
5 marks, and the "positive" plaques are cored from the 
agarose to recover the phage. Because of the high 
density of plaques on the original plate, it is 
usually impossible to isolate a single plaque from the 
plate on the first pass. Accordingly, phage recovered 

10 from the initial core are re-plated at low density and 
the process is repeated to allow isolation of 
individual plaques and hence single clones of phage. 

Successful screening experiments are 
optimally conducted using 3 rounds of serial 

15 screening. The recovered cells are then plated at a 
low density to yield isolated colonies for individual 
analysis . The individual colonies are selected and 
used to inoculate LB culture medium containing 
ampicillin. After overnight culture at 37°C, the 

20 cultures are then spun down by centrifugation. 

Individual cell aliquots are then retested for binding 
to the target ligand attached to a solid support. 
Binding to other supports, having attached thereto a 
non-relevant ligand, can be used as a negative 

25 control . 

One important aspect of screening the 
libraries is that of elution. For clarity of 
explanation, the following is discussed in terms of 
TSAR expression by phage; however, it is readily 

30 understood that such discussion is applicable to any 
system where the random peptide is expressed on a 
surface fusion molecule. It is conceivable that the 
conditions that disrupt the peptide-target 
interactions during recovery of the phage are specific 

35 for every given peptide sequence from a plurality of 
proteins expressed on phage. For example, certain 
interactions may be disrupted by acid pH's but not by 
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basic pH's, and vice versa. Thus, it may be desirable 
to test a variety of elution conditions (including but 
not limited to pH 2-3, pH 12-13, excess target in 
comp tit ion,, detergents, mild protein denaturants, 
5 urea, varying temperature, light, presence or absence 
of metal ions, chelators, etc.) and compare the 
primary structures of the TSAR proteins expressed on 
the phage recovered for each set of conditions to 
determine the appropriate elution conditions for each 
10 ligand/TSAR combination. Some of these elution 

conditions may be incompatible with phage infection 
because they are bactericidal and will need to be 
removed by dialysis (i.e., dialysis bag, 
Centricon/Amicon microconcentrators) . 

The ability of different expressed proteins 
to be eluted under different conditions may not only 
be due to the denaturation of the specific peptide 
region involved in binding to the target but also may 
be due to conformational changes in the flanking 
20 regions. These flanking sequences may also be 
denatured in combination with the actual binding 
sequence; these flanking regions may also change their 
secondary or tertiary structure in response to 
exposure to the elution conditions (i.e., pH 2-3, pH 
12-13, excess target in competition, detergents, mild 
protein denaturants, urea, heat, cold, light, metal 
ions, chelators, etc.) which in turn leads to the 
conformational deformation of the peptide responsible 
for binding to the target. 

According to another alternative method in 
which the TSARs contain a linker region between the 
binding domain and the effector domain, particular 
TSAR libraries can be prepared and screened by: (i) 
engineering a vector, preferably a phage vector, so 
that a DNA sequence encodes a segment of Factor Xa (or 
Factor Xa protease cleavable peptide) and is present 
adjacent to the gene encoding the effector domain, 



25 



30 



35 
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e.g., the pill coat protein gene; (2) construct and 
assemble the double stranded synthetic 
oligonucleotides as described above and insert into 
the engineered vector; (3) express the plurality of 
5 vectors in a suitable host to form a library of 
vectors; (4) screen for binding to an immobilized 
ligand; (5) wash away excess phage; and (6) treat the 
entire library with Factor Xa protease. The particle 
will be uncoupled from the peptide -ligand complex and 

10 can then be used to infect bacteria to regenerate the 
particle with its full-length pill molecule for 
additional rounds of screening. This alternative 
embodiment advantageously allows the use of 
universally effective elution conditions and thus 

15 allows identification of phage expressing TSARs that 
otherwise might hot be recovered using other known 
methods for elution. To illustrate, using this 
embodiment, exceptionally tight binding TSARs could be 
recovered . 

20 In a particular embodiment of the present 

invention, TSAR libraries are screened with 
oligonucleotides to identify syngenes encoding binding 
domains specific for particular DNA sequences. When 
the library is screened against oligonucleotide 

25 targets, binding is done with high levels of salts, 
e.g., to maximize the hydrophobic interactions that 
are characteristic of specific protein/DNA 
interactions and to minimize non-specific 
interactions. Non-specific protein/DNA interactions 

30 are generally electrostatic and can be reduced by high 
salt concentrations that cause saturation of the 
charges on the protein and DNA. 

5.2.1. SYNGENES THAT REGULATE TRANSCRIPTION 

35 Syngene- encoded sequences that can be used 

to regulate transcription are identified by screening 
the desired random peptide library for binding to a 
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ligand of choice, where the ligand of choice is (or 
compris s) a nucleic acid sequence (transcriptional 
regulatory site) that regulates transcription; a 
transcription factor that binds to a transcriptional 
regulatory site and thereby regulates transcription; 
or a protein binding partner of the transcription 
factor, which binds to the transcription factor and 
thereby inhibits the transcription factor's binding to 
the transcriptional regulatory site. 

It is well known that all eukaryotic genes 
are regulated by proteins known as transcription 
factors. In general, for each gene, several factors 
are utilized to appropriately regulate the gene. A 
large subset of these factors bind directly to DNA 
sequences near the transcriptional start site of the 
gene. Most of the sites that bind a given 
transcription factor have a great deal of similarity 
among themselves. The nucleotide sequences found in 
common among all the sites for a given factor are 
called that factor's "consensus sequence." For 
example, the consensus sequence for NF-IL6 (also known 
as C/EBP) is 5 ' T (T/G) NNGNAA (T/G) 3 ' (SEQ ID NO: 59) 
(Zhang et al., 1994, Proc. Natl. Acad. Sci. USA 
91:2225-2229). The consensus sequence for NF-kB is 
25 5 ' GGGRNNYYCC (SEQ ID NO: 60) (Okamoto et al . . 1993, J. 
Biol. Chem. 269:8582-8589). The consensus sequence 
for GATA-1, GATA-2, and GATA-3 is 5' (T/A) GATA(G/A) 
(Zon et al., 1991, Proc. Natl. Acad. Sci. USA 
88:10642-10646) . 



15 



20 



30 



In addition, certain transcription factors 
are not single proteins, but are multimers composed of 
the products of different genes. For instance, the 
AP-l transcription site is activated by heterodimers 
of c-jun and c-fos. Moreover, there are several 
35 members of the c-jun family. For the NF-kB family, 
the transcriptional activator may be composed of 
either homo- or heterodimers of members of the NF-xB 
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family, including p65 {also known as Rel A) and p50. 
Different NF-icB sites may be activated preferentially 
by different combinations of p65 and/or p50 (Muroi et 
al., 1993> J. Biol. Chem. 268:19534-19539; Zabel et 
5 al., 1991, J. Biol. Chem. 266:252-260; Baldwin and 
Sharp, 1988, Proc. Natl. Acad. Sci. USA 85:723-727; 
Nakayama, et al., 1992, Mol. Cell. Biol 12:1736-1746; 
Fujita et al., 1992, Genes and Development 6:775-787). 
Thus, different genes may be regulated differentially 
10 by specific combinations of transcription factor 
homologs . 

The activity of some transcription factors 
is regulated by proteins that bind to and thus inhibit 
the factor's activity. For example, the activity of 

15 NF-kB can be inhibited by binding to the protein IF-kB 
(Kerr et al., 1992, Current Opin. Cell. Biol. 4:496- 
501) . The activity of the AP-l transcription factor 
can be inhibited by binding of the factor IP-1 (Auwerx 
and Sassone-Corsi, 1991, Cell 64:983-993). Often, as 

20 in the case of IF-kB, the inhibitory protein is 

located in the cytoplasm, where its binding to the 
transcription factor sequesters that factor away from 
the nucleus, thus preventing the factor from 
activating transcription. Other transcription factors 

25 that can be found in the cytoplasm, presumably in 

inactive form, include: the glucocorticoid receptor, 
the yeast transcription factor SW15, and NF-AT (a 
transcription factor for certain lymphokines and 
cytokines) . 

30 Alternative splicing may give rise to 

several versions of a transcription factor. Some of 
these versions may inhibit rather than activate 
transcription, and the interaction of different 
versions of the same factor may produce a complex 

35 pattern of regulation (Foulkes and Sassone-Corsi, 
1992, Cell 68:411-414) . 
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A given gene may be regulated by one or more 
different transcription factors in a tissue or cell 
type specific manner. For instance, it has been 
demonstrated that at least three cis-elements (AP-l, 
C/EBP (also known as NF-IL6) and NF-«B-like sites) are 
involved in IL-8 gene activation (Okamoto et al., 
1994, J. Biol. Chem. 269:8582-8589; Mukaida et al., 

1990, J. Biol. Chem. 265:21128-21133; Mahe et al., 

1991, J. Biol. Chem. 266:13759-13763; Yasumoto et al., 

1992, J. Biol. Chem. 267:22506-22511). It was 
concluded that the relative importance of the 3 
elements varied in a cell type-specific manner. in 
the human fibrosarcoma cell line 8387, the C/EBP and 
NF-«B sites are essential, but the AP-l site is not 

15 (Mukaida et al . , 1990, J. Biol. Chem. 265:21128-21133; 
Mahe et al . , 1991, J. Biol. Chem. 266:13759-13763). 
In contrast, in MKN-45 cells, derived from human 
gastric cancer, the AP-l and NF-kB sites are 
important, but the C/EBP site is unnecessary (Yasumoto 
et al., 1992, J. Biol. Chem. 267:22506-22511). 

Transcription factors have been shown to be 
modular proteins. That is, they are composed of more 
or less discrete domains that perform more or less 
discrete functions. For example, most transcription 
25 factors possess a DNA binding domain that mediates 
binding to a transcriptional regulatory site. Also 
common are transcriptional activator signals (TASs) . 
TASs are domains that, when bound to a transcriptional 
regulatory site of a gene (by virtue of being linked 
30 to a DNA binding domain) , mediate the transcriptional 
activation of the gene. Another common domain found 
in transcription factors is a domain that permits the 
dimerization of the factor. The dimerization is 
sometimes homodimerization, sometimes 
35 heterodimerization. An example of such a dimerization 
domain would be a leucine zipper or a helix- loop-helix 
domain . 
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NF-kB is a transcription factor that is a 
heterodimer consisting of p50 and p65 subunits 
{Lenardo and Baltimore, 1989, Cell 58:227-229; 
Ba uerle> 1991, Biochem. Biophys. Acta 1072:63-80} . 
5 IF-kB (or IxB) is a cytoplasmic protein that binds to 
and specifically inhibits NF-kB (Baeuerle and 
Baltimore, 1988, Science 242:540-546; Baeuerle & 
Baltimore, 1989, Genes & Dev. 3:1689-1698). Free NF- 
x5 migrates to the nucleus where it binds to the kB 

10 site in the enhancer/promoter of various genes, such 

as the T cell receptor j8 chain, interleukin-2 receptor 
a chain, myosin heavy chain class I, and cytokine 
genes such as those for beta- interferon, GM-colony 
stimulating factor (CSF) , G-CSF, interleukin-2, and 

15 tumor necrosis factor- a and -0, thereby regulating 
their expression (Lenardo and Baltimore, 1989, Cell 
58:227-229; Baeuerle, 1991, Biochem. Biophys. Acta 

1072:63-80) . 

Binding sites for the transcription factors 

20 NF-kB and NF-IL6 have been shown to be important in 
the regulation of a wide variety of genes that are 
involved in health and disease. For example, it has 
been suggested that derangements in the control of 
expression of the IL-8 gene may be involved in the 

25 pathogenesis of several inflammatory diseases (Okamoto 
et al., 1994, J. Biol. Chem. 269:8582-8589). An NF-kB 
binding site in the promoter of the IL-8 gene has been 
shown to be involved in the transcriptional control of 
the IL-8 gene (id.) . The same NF-kB site has been 

30 shown to be one of the targets through which the 

immunosuppressant FK506 acts (id.). The ability to 
regulate this site through the use of syngenes 
encoding highly specific DNA binding domains that bind 
to the site would be of great use in the management of 

35 those inflammatory disease in which IL-8 is involved. 

The GATA family of transcription factors are 
a group of zinc finger DNA binding proteins that 
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recognize related binding sites that contain the core 
sequence GATA. The known members of this family are 
GATA-l, GATA- 2, GATA- 3, GATA-GT1 , and GATA-GT2 {Zon et 
al., 1991, Proc. Natl. Acad. Sci. USA 88:10642-10646; 
Maeda, 1994, J. Biochem. 115:6-14). GATA-l, GATA- 2 , 
and GATA- 3 are expressed primarily in hematopoietic 
tissues such as the erythroid, megakaryocyte, T cell, 
and mast cell lineages. Gene knockout studies have 
shown that GATA-l is required for erythroid 
development in mice (Pevny et al., 1991, Nature 
349:257-260). GATA- 2 is found in non-hematopoietic 
cells such as brain, liver, and endothelial cells 
(Dorfman et al., 1992, J. Biol. Chem. 267:1279-1285). 

GATA-GT1 and GATA-GT2 are found in gastric 
15 parietal cells. They bind to a sequence motif in the 
5' upstream regions of both the H*/K*-ATPase a and 0 
subunit genes. GATA-GTl and GATA-GT2 may be involved 
in gastric specific transcriptional regulation of many 
proteins (Maeda, 1994, J. Biochem. 115:6-14). 

The Ets family of transcription factors 
includes at least 30 different DNA binding proteins in 
species as evolutionarily distant as Drosophila and 
humans. These proteins show pronounced amino acid 
sequence similarities in an approximately 84 amino 
25 acid region that corresponds to their DNA binding 
domains. They recognize distinct but related DNA 
binding sites that are about ten nucleotides long and 
that share a common, short motif - (C/A) GGA(A/T) . The 
specificity of binding of the individual Ets proteins 
30 is determined mainly by the other nucleotides in the 
binding site. 

Numerous genes have been shown to possess 
functional Ets binding sites in their promoters 
(Waslyk, 1993, Eur. J. Biochem. 211:7-18). One 
35 particularly interesting example of such a gene is the 
T cell antigen receptor a chain gene (TCRor) . It has 
been shown that mutation of this Ets binding site 
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abolishes the transcription of a transf cted TCRor gene 
in the Jurkat T cell line. It has also been shown 
that this site binds the product of the Ets-1 proto- 
oncogene* (Ho et al., 1990, Sci nee 250:814-818). Ets- 
1 expression is development ally regulated in parallel 
with the expression of the TCRor gene, leading to 
speculation that Ets-1 is the controlling factor in 
TCRcr expression (Leiden, 1993, Ann. Rev. Immunol. 
11 :539-570. 



5.2.1.1. SYNGENES ENCODING PROTEINS 

THAT BIND TO DNA SEQUENCES THAT 
REGULATE TRANSCRIPTION 



In one embodiment, the invention involves 
the identification of syngene -encoded peptide 
sequences that bind to nucleic acid sequences that 
regulate (promote or alternatively inhibit) 
transcription (such nucleic acid sequences being 
collectively referred to as transcriptional regulatory 
sites) . Such transcriptional regulatory sites are 
preferably those situated in vivo in or near the 
sequence of a preselected gene, the product of which 
is involved in health or disease. In a specific 
embodiment, the transcriptional regulatory site is a 
sequence recognized by a transcription factor. 
Oligonucleotides containing those transcriptional 
regulatory sites are synthesized and are used to 
screen a random peptide library. In this way, peptides 
are found that specifically bind the transcriptional 
regulatory sites. In addition, by screening against 
the appropriate transcriptional regulatory sites, a 
synthetic binding domain can be identified that 
specifically regulates the transcription of one or 
more genes of choice, but not of all the genes whose 
transcription is regulated by a natural transcription 
factor which recognizes the same transcriptional 
regulatory site as does the syngene -encoded binding 
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domain. Such a binding domain, that sp cifically 
regulates one or more but not all of a family of 
transcriptional regulatory sites (a family referring 
to those transcriptional regulatory sites regulated by 
5 an identical natural transcription factor) , is 

referred to herein as a highly specific DNA binding 
domain (HSDB) . HSDBs encoded by syngenes are useful 
in enhancing or repressing the transcriptional 
activity of only one or a selected number of a 
10 preselected class of genes. 

Since the specific nucleotide sequence for a 
particular example of a given transcriptional 
regulatory site is often unique, or nearly so, the 
invention provides for the isolation of HSDBs that 
15 bind specifically to a particular transcriptional 

regulatory site within or near a particular gene. In 
a specific embodiment, a syngene product is targeted 
to the nucleus of a cell via a nuclear localization 
signal (NLS) where it binds to a specific 
20 transcriptional regulatory site within a gene whose 
activity it is desired to regulate for therapeutic 
reasons . Such binding interferes with the binding of 
a natural transcription factor to that site and thus 
blocks the transcription of the preselected gene. The 
25 syngene product in this case would not contain a 
transcription activating signal (TAS) . 

In a similar fashion, it is possible to 
incorporate a TAS into the syngene product plus NLS. 
This will target the TAS to a transcriptional 
regulatory site of a particular preselected gene 
within a cell and thus activate that gene selectively. 
The site at which the syngene product binds could be a 
site that is naturally involved in the activation of 
the transcription of the preselected gene. 
35 Alternatively, the site could be a site that is not 
naturally involved in the transcriptional regulation 
of the gene. 
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The invention encompasses a method for 
functionally identifying protein structural domains 
that can be combined with other functional domains to 
effect the production of a unique, non-naturally 
S occurring protein that can act at specific cis- acting 
negative or positive transcriptional regulatory sites 
within eukaryotic cells. The invention is further 
directed to the syngenes encoding such a protein as 
well as to the uses of such a syngene. This invention 

10 in a particularly preferred embodiment is directed to 
the repression and activation of genes regulated, at 
least in part, by the transcriptional factors NF-kB 
and NF-IL6 (also known as C/EBP) . In particular, this 
embodiment is directed to the genes for IL-6 and IL-8, 

15 whose products are cytokines that respond to different 
inflammatory signals and are regulated by both NF-kB 
and NF-IL6. In addition, regulation of the gene for 
the HLA class I locus, which responds to NF-xB, but 
not NF-IL6, is a subject of this invention. 

20 In a preferred embodiment, the 

transcriptional regulatory site in the vicinity of a 
preselected gene to which the encoded syngene product 
binds (either for the purpose of activating or 
repressing transcription of the preselected gene) is 

25 an NF-*B site (see Table 6, below). In another 

preferred embodiment, the site to which the encoded 
syngene peptide binds is an NF-IL6 site. 

In another method of regulating 
transcription, the encoded syngene product binds to a 

30 protein that is a transcriptional regulator (i.e. that 
binds to DNA and regulates transcription) . For 
example, the encoded syngene product can bind to the 
NF-kB protein and thereby prevent the NF-«B protein 
from activating transcription. 

35 In another method of activating 

transcription, the encoded syngene product need not be 
localized to the nucleus. In this method, the product 
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of the syngene functions in the cytoplasm where it 
binds to an endogenous binding partner of a 
transcriptional regulator. In one embodiment, the 
syngene product activates transcription of a 
5 preselected gene by binding to a transcription factor 
inhibitor that is involved in the inhibition of 
transcription of the preselected gene. By way of 
example, such a transcription factor inhibitor is IP- 
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In a preferred aspect of the invention, the 
syngenes encode a binding domain that is highly 
specific for the DNA binding site of a transcription 
factor. Such syngenes encode proteins containing 
HSDBs that are identified by screening a random 
peptide library for binding to a target ligand, in 
which the ligand is a nucleic acid that contains a 
binding site recognized by the transcription factor. 
HSDBs will show greater specificity for a given 
transcription factor binding site than does the 
natural transcription factor that binds to that site 
An HSDB selectively binds a single or limited examples 
of a given transcription factor's binding site from a 
single gene or a limited number of genes, whereas the 
natural transcription factor will bind to sites in 
25 additional genes. 

Transcription factor binding sites are known 
for a very large number of genes and are suitable for 
use as target ligands. For example, the following 
tables represent examples of known transcription 
factors and their respective binding sites that can be 
used in the methods of the present invention to 
isolate binding domains specific for those factors or 
sites. Sequences encoding such binding domains could 
be built into syngene constructs that would be useful 
35 for the modulation of the activity of genes whose 
transcription is dependent upon the binding of 
transcription factors to those sites. 
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For example, Table 1 shows that a DNA 
binding site for the Ets-like transcription factor 
Fli-1 is present in the promoter of the c-myc proto- 
oncogene/ C-myc has been shown to be involved in the 
5 development of a wide variety of cancers (Marcu et 

al., 1992, Ann. Rev. Biochem. 61:809-860; Cole, 1986, 
Ann. Rev. Genet. 20:361-384). The ability to inhibit 
the activity of c-myc would be expected to have 
utility in the treatment of those cancers in which c- 
10 myc is upregulated. 
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Transcription 
Factor 



API 
ATF 

CRE-BOl 
Ets-1 

GATA-1 



HIVEN86A 



MBF-1 
NF-kB 
Oct-1 
Spl 



Table 2* 

DNA Sequences Recognized by Eukaryotic 
DNA-Binding Transcriptional Factors 



Sequence Bound 



TGA^/cm^A 
TGACG^/tX 0 /^) 
GGGGG 

f/ A )GATAR 



TGGGGATTCCCA 



(C/tJTAAAAATAA^/t^/t^/t) 

ggga^tnC/cKx: 
atgcaaat 

(C/^KKSC^/t^/J^/J^/t) 



SEQ 
ID 

NO. 



Tissue 
Specificity 



Ubiquitous 

Ubiquitous 

Erythrocytes 

B cells 
T cells 



65 



66 
67 



Erythroid cells 
Megakaryocytes 
Mast Cells 

Activated T 
cells 
B-Lymphocytes 

Myocytes 

Ubiquitous 

Ubiquitous 

Ubiquitous 



* from Gambari and Nastmzzi. 1994, Biochem. Phann. 47(4):599-610 



35 
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Table 2 gives the DNA sequences recognized 
by several ubiquitous transcription factors. Such 
ubiquitous transcription factor binding sites are 
ideal subjects for use as targets to select highly 
specific binding domains for incorporation into 
syngenes. This is because these factors regulate a 
wide variety of genes in many cell types. Using the 
methods of the present invention as disclosed in 
Section 6.1 and its subsections, one can obtain 
syngenes that encode products that are able to 
selectively bind to a particular binding site of a 
ubiquitous transcription factor in a particular gene. 
This allows for the regulation of that particular gene 
without disturbing the regulation of the many other 
genes that also contain DNA binding sites for the 
ubiquitous transcription factor. 
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Table 3 shows a number of transcription 
factors, the DNA sequences they bind to, and the genes 
they regulate during T cell development and 
activation. These DNA sequences are attractive 
targets for use in obtaining highly specific DNA 
binding domains that recognize them. Because of the 
well known crucial role that T cells play in the 
immune system, syngenes that encode products that are 
able to regulate the activity of the' genes listed in 
Table 3 would be expected to have a variety of uses in 
the treatment of disease. 
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Table 4 shows a number of transcription 
factors that could be used as targets to obtain 
syngenes that encode products that would be useful in 
regulating the T cell receptor. 



Table 5* 

Specific Binding Sequences for Eta Family Members 







SEQ 


10 




ID 






NO: 



15 



20 



Promoter sites: 


l. 


2. 


3. 


4 . 5 . 6 < 


7. 


a. 


9, 


10 




Polyomarvirus PEA3 


G 


C 


A 


G G A 


A 


G 


T 


G 


69 


Stromelysin 


G 


C 


A 


G G A 


A 


G 


c 


A 


70 




C 


c 


A 


G G A 


A 


A 


T 


G 


71 


MSV LTR 


A 


c 


A 


G G A 


T 


A 


T 


C 


72 




A 


G 


c 


G G A 


A 


G 


T 


G 


73 


TCR- alpha 


A 


G 


A 




T 


G 


T 


G 


74 


TCR-beta 


A 


C 


A 


G G A 


T 


G 


T 


G 


75 


Ig it 3 f enhancer 


T 


c 


A 


G G A 


T 


G 


T 


G 


76 


HTLV-I 


G 


G 


A 


G G A 


A 


A 


T 


G 


77 




c 


C 


G 


Q <? A 


A 


G 


c 


C 


76 




c 


G 


C 


G G A 


A 


A 


T 


G 


79 


Selected sites: 


















Fisher et al. , 1991 


Pu 


C 


C 


G G A 


A 


G 


T 


Py 


80 






G 


a 








c 




Nye et al . , 1992 


A 
9 


C 


c 

a 


<3 <5 A 


A 
T 


G 
a 


c 

T 


N 


81 


woods et al. , 1992 


A 


c 


c 

A 


G G A 


A 
T 


G 
A 


T 


T 


62 
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30 



35 



Promoter sites : 
Polyoma virus PEA3 
Stromelysin 

woods et al. consensus 



1 
G 
G 
C 

c 



2.3 
C A 
C A 
C A 
C A 



4.5.6.7.8.9.10 

G Q A A G T G 83 

G G A A G C A 84 

G G A A A T G 85 



ERG 


Promoter sites: 
E74 


1.2.3.4.5.6.7.8.9.10 
ACCfG^AGTA 


87 


GABP 


Promoter sites: 
HSV 1CP4 


1.2.3.4.5.6.7.8.9.10 

AACGGAAGCG 

AGCGGAAACC 


88 

89 


PBA3 


Polyoma virus PEA3 


1.2.3 .4.5.6.7.8.9.10 
G C A G G A A G T G 


90 


sua/ 

8AP1 


Promoter sites: 

Fos 

E74 


1.2. 3. 4. 5. 6. 7. 8. 9. 10 

ACAGGATGTC 

ACCGGAAGTA 


91 
92 


K74 


Selected site; 
Urness £ Thummel 1990 


1.2. 3. 4. 5. 6. 7. 8. 9. 10 
PyCAGGAAGTN 


93 


ELT1 


Promoter sites: 
IL-2 NFAT-1 
IL-2 NFIL-2B 
HIV- 2 CD3R 


1.2. 3. 4. 5. 6. 7. 8. 9. 10 
AGAGGAAAAA 
GGAGGAAAAA 
ACAGGAACAG 


94 
95 
96 
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10 



15 



pin 



Promoter aim 
MHCII 

Ig- Kappa 3' enhancer 



1.2.3.4.5.6.7.8.9.10 
A G A G G A A C T T 

TGA3..GAACTG 



97 
98 



The 10-nucleotide sequence surrounding the GGA core is shown 
?^« C ! "? re re P° rted *Y Wasylyk et al., 1990, Nature 

J 10^1127 1134 Tftfe^ ^i 1 ^' W «V^ 1- EMBO 

tv 6 97sl990 (J^f 0m Si yS ^', KtS i +2) ' "*» ct &1 - «». Genes 
% « <MSV), Ho et al., 1990, Science 250:814-818 

£5 "r^?™ 9 !' f 1 " " 92 ' J - 175:1391-1399 (TCR 0, 

5 1 0l fi« " ml " 1991 ' J * Vircl - €5:5513-5523 (HTLV1), Fisher 
et al., 1991, Oncogene 6:2249-2254 (Btsi.-elccted site) kve ft 

Haa * » ACida Ree " 4 -«»^704 (Etsl-selected site. 
e E S l{' R ^ ^ Rao, 1991, Oncogene 6:2285-2289 (Erg), Thomson 
et al., 1991, Science 253:762-768 (GABP) , Xin et al 992 

lifi^i: S'l-*? 0 Redd y' 1992 « Oncogene 7:fiS-70 (BIJtl, E74) 
Dalton and Treisman, 1992. Cell 68:597:612 (SAPl, Fob) Drne«' 
and Tfaummel, 1990. Cell 63.47-61 (E74), Nana et al 1002 t 
Exp. Med. 175:1391-1399 <Elfl> , Kleins* etTal 1S90' Sll' 

it. 368-378 (PUl, ig-« 3- enhancer). 

From Wasylyk et al.. 1993, Eur. J. Biochem 211:7-18 



20 



25 



Table 5 shows a large number of known Ets 
family transcription factor binding sites. Among the 
genes listed in Table 5, genes such as fos (involved 
in growth control) and interleukin-2 (involved in the 
control of the immune system) are attractive 
candidates for regulation by syngene products. 

Examples of NF-kB sites which are suitable 
for use as a ligand of choice are shown in Table 6 
(see Muroi et al., 1993, J. Biol. Chem. 

268:19534-19539; Lenardo and Baltimore, 1989, Cell 
58:227-229) . 



30 



35 



Table 6 

NP-«b Transcriptional 
Regulatory sit-P* in ^ nff? 



DNA SEOtTRWCF. 
GGGAAATTCC 
GGG A TTTTCC 
TGGAATTTCC 



SEP ID NO 
99 
100 
101 



beta -interferon 

interleukin-6 

interleukin-8 
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25 



30 



GGGACTTCCC 
GGGAACTACC 

GGGGAATCTC 

GGGAATCCTT 



GGGGCTTTCC 



GGGGCTGCCC 



GGGAATATTC 

GGGGATTCCC ( C ) 

GGGGATTCCCC 

GGGACTTTCC 

GGGACTTTCC 

GGGACTTTCC 

GGGACTTTCC 

GGGACTTTCC 

GGGACTTTCC 

GGGGATTTCC 

GGGGATTTCC 

GGGNACTTTCC 

GGGATTTCAC 

GGGATTTCCT 

GGGAATCTCC 

GGGRNNYYCC* 



102 
103 

104 

105 



10€ 



107 



108 
109 
109 
110 
110 
110 
110 
110 
110 

111 
111 

110 
112 

113 
114 

115 



MHC class II, I-Ea d 

GM- colony stimulating 
factor 

G- colony stimulating 
factor 

tumor necrosis 
factorial (nucleotide 
position -850) 

tumor necrosis 

f actor - a 2 ( nuc leot ide 

position -510) 

tumor necrosis 

f actor- cr3 (nucleotide 

position -610) 

Fc?R 

MHC class I (H2-K b ) 
HLA Class I genes 
HIV NF-kB 

Mouse Igx enhancer 
SV40 enhancer 
CMV(4) 

/S 2 - microglobulin 
serum amyloid A 
human Ig« enhancer 
CMV(3) 
CMV{2) 

interleukin-2 (IL-2) 
mouse IL-2 receptor or 
human IL-2 receptor or 
consensus motif 



* R 



A or G; N - A, G, T or C; Y = C or T 



35 



For those genes of interest for which 
transcription factor binding sites are not known, as 
in the case, for example, of a newly discovered gene, 
well known methods will allow the determination of 
transcription factor binding sites that are involved 
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in the regulation of the gene. Such methods include, 
for example, comparing the DNA sequences in the 5' 
region of the gene to the sequences of known 
transcription factor binding sites, gel shift assays 
5 to determine if known transcription factors bind to 
potential transcription factor binding sites in the 
gene, and mutagenesis of a suspected binding site 
followed by 'assays to determine if the expression of 
the mutagenized gene has been affected by the 
10 mutagenesis. 

In an alternative embodiment of the invention, 
the molecule comprising the transcriptional regulatory 
site that constitutes the ligand of choice used to 
screen the random peptide library is a site that is 

15 not known to be bound by a protein transcription 

factor, but which is situated close to (e.g., within 
50 nucleotides of) the initiation site of the gene, 
and which has the ability to regulate transcription of 
such gene upon binding by a syngene -encoded product. 

20 Such a syngene- encoded product would be expected to 

sterically hinder the ordered process of transcription 
factor and RNA polymerase binding to this region of 
the gene that is necessary for transcription to occur. 

25 5.2.1.2. SYNGENES FOR REGULATION OF 

TRANSCRIPTION FACTORS AND 
TRANSCRIPT^ FACTOR INHIBITORS 

The invention provides a method for 
identifying a syngene that encodes a protein that 
binds to a ligand of interest comprising screening a 
random synthetic peptide library to identify a peptide 
that binds to a ligand of interest, in which the 
ligand is (1) a DNA region that functions to regulate 
transcription; (2) a protein that is a transcriptional 
regulator that functions by binding to DNA; or (3) a 
protein modulator that binds to and inactivates the 
function of a protein transcriptional regulator. 



30 



35 
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In cases such as that of NF-jcB and IF-kB 
(K rr et al., 1992, Current Opinion in Cell Biology 
4:496-501) or of AP-1 and IP-1 (Auwerx and Sassone- 
Corsi, 1991, Cell 64:983-993), the existence of a 
5 transcription factor and an inhibitor of that 

transcription factor affords the opportunity for 
regulating the transcription of genes that depend upon 
the transcription factor for their activity. This can 
be done by developing syngenes that encode binding 

10 domains that are specific for either the transcription 
factor or its inhibitor. 

If a random peptide library is screened with 
a transcription factor as a ligand, binding domains 
(and their encoding nucleic acids) will be identified 

15 that are capable of specifically binding the 

transcription factor. In that respect, such binding 
domains mimic transcription factor inhibitors. Such 
binding domains can function as inhibitors of 
transcription in much the same way as the natural 

20 transcription factor inhibitors do. If such a binding 
domain were built into a syngene- encoded product 
lacking a nuclear localization signal, the syngene 
product encoding such a binding domain would be 
expected to be found predominantly in the cytoplasm 

25 where it would be able to sequester the transcription 
factor away from the nucleus, thus nullifying the 
transcription factor's activity. However, it is 
envisioned that such inhibition can also occur in the 
nucleus, e.g., if a nuclear localization signal were 

30 employed as part of the encoded syngene product. 

For example, if the library were screened 
with the protein NF-«8, binding domains would be 
identified that could function, when part of a 
syngene -encoded product, much in the way IF-«B 

35 functions. Likewise, when a library is screened with 
either c-fos or c- jun (the subunits of AP-1) , binding 
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domains would be identified that could be used to 
create IP-l-like peptides encoded by syngenes. 

If random peptide libraries were screened 
with transcription factor inhibitors according to the 
invention, binding domains would be identified that, 
when expressed from a syngene, would bind to the 
inhibitor, thus preventing it from binding to the 
transcription factor. This would free the 
transcription factor, allowing it to migrate into the 
nucleus and activate transcription, in this way, 
syngenes encoding binding domains specific for 
transcription factor inhibitors could be used as 
activators of transcription. 

15 5.2.2. SYNGENES FOR RRCTTt.& tiqn of ^pnpjnpj g 

Syngenes may be used to control apoptosis. 
Apoptosis, the process of programmed cell death, is 
seen in a wide range of organisms where it is involved 
in the development of the nervous system, the 
20 remodeling of limb buds, the destruction of virally 
infected cells, the clonal deletion of B and T cells 
in the immune system, and many other processes. 
Programmed cell death thus appears to be a normal part 
of development and physiology. For reviews, see Kerr 
et al., 1972, Br. J. Cancer 26:239-257; Wyllie et al., 
1980, Int. Rev. Cytol. 68:251-306; Raff, 1992, Nature 
356:397-400; Vaux, 1993, Proc. Natl. Acad. Sci . USA 
90 :786-789. 

Since apoptosis is a normal and important 
30 part of many different biological processes, it is 

important for the health of an organism that apoptosis 
be properly regulated. It appears that some instances 
of disease can arise as a result of improper 
regulation of cell death. For example, Alzheimer's 
35 and Parkinson's diseases are associated with the 

premature death of certain neurons (Jenner, 1989, J. 
Neurol. Neurosurg. Psychiatry 22-28; Kosik, 1992, 



25 
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Science 256:780-783); inhibition of programmed cell 
death may be involved in autoimmune disease 
(Goldstein, et al . , 1991, Immunol. Rev. 121:29-65; 
Watanabe et al., 1992, Nature 356:314-317; Strasser et 
5 al., 1991, Proc. Natl. Acad. Sci. USA 68:8661-8665); 
and failure of cell death may be involved in cancer 
(Korsmeyer, 1992, Blood 80:879-886). Given the 
importance of apoptosis in health and disease, it 
would be of great value to have a method of regulating 
10 apoptosis. 

There appears to be a family of related 
proteins involved in the control of apoptosis in 
mammalian cells. The two best-studied members of this 
family are bcl-2 and bax. Bcl-2 was first identified 

15 as a gene on human chromosome 18 that was involved in 
the t(l4;l8) chromosomal translocations observed in 
follicular B-cell lymphomas (Tsujimoto et al., 1985, 
Science 229:1390-1393; Bakhshi et al . , 1985, Cell 
41:899-906; Cleary and Sklar, 1985, Proc. Natl. Acad. 

20 Sci. USA 82:7439-7449). 

Together, bcl-2 and bax proteins control the 
process of apoptosis in response to various stimuli. 
It has been postulated that the ratio of bcl-2 to bax 
determines whether a cell will initiate the program of 

25 apoptosis in response to apoptotic stimuli or will 

ignore such stimuli and continue to grow, or at least 
survive (Oltvai et al., 1993, Cell 74:609-619). A 
high ratio of bcl-2 to bax leads to survival; a low 
ratio to death. The interaction of bcl-2 and bax is 

30 thought to depend on the formation of heterodimers of 
the proteins encoded by these two genes. A study in 
which mutations were engineered in two domains of bcl- 
2 found a correlation between the mutant bcl-2 
proteins' ability to heterodimerize with bax and the 

35 mutant proteins' ability to prevent apoptosis (Yin et 
al., 1994, Nature 369:321-323). 
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The two domains in the human bcl-2 protein 
that appear to interact with bax are amino acids 136- 
156 and amino acids 187-202 (ibid.). By using one of 
these two domains as a ligand to screen a random 
peptide library, it should be possible to isolate 
binding domains that are specific for the bcl-2 
protein. These binding domains could be used to make 
syngenes that would be useful in controlling the 
process of apoptosis. Alternatively, it is possible 
to use other regions of the bcl-2 protein, or even the 
entire protein, to find binding domains that are 
specific for the bcl-2 protein. Such binding domains 
could also be used to make syngenes that would be 
useful in the control of apoptosis. Likewise, one 
could screen a random peptide library with the bax 
protein to find binding domains specific for bax 
These binding domains would also be useful in making 
syngenes for use in regulating apoptosis. 

As an alternative way of regulating 
apoptosis, one could also regulate the transcription 
of either bcl-2 or bax by the use of syngenes directed 
to a transcription factor binding site that is 
involved in the regulation of the transcription of 
those genes. The methods described herein for the use 
of syngenes to regulate transcription could be easily 
adapted to regulate the transcription of the bcl-2 
gene or the bax gene. 

5 ' 2 ' 3 - OTHER TARGETS for gvfl^ flE PRnnnrr g 
Although a preferred embodiment of the 
invention is directed to syngenes whose products bind 
to transcriptional regulatory sites, other embodiments 
of the invention encompass a plethora of uses, m 
particular, it should be emphasized that syngene 
products are in no way restricted to functioning 
solely in the nucleus. As discussed below, syngenes 
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are useful for regulating activities that take place 
in the cytoplasm, or on the cell's surfac , as well. 

A syngene product may have one or more of a 
variety of functional properties. Its product may be 
5 a signal transduction inhibitor. Such a signal 

transduction inhibitor would block the activity of a 
signal pathway such as that which leads from a cell 
membrane* receptor through various intermediate 
kinases, phosphatases or other molecules to the 

10 nucleus, where transcription of particular genes is 
affected. An example of such a signal transduction 
inhibiting syngene would be a syngene the product of 
which binds to and blocks the function of a tyrosine 
kinase membrane receptor. Other types of signalling 

15 molecules may also be targets of the syngenes of the 
invention. Examples of such signalling molecules are 
cytoplasmic kinases (for example, src or raf ) , 
adenylyl cyclases, guanylyl cyclases and the like. 

Many of the proteins that participate in 

20 signalling possess SH2 and/or SH3 domains. These SH2 
and SH3 domains are involved in protein/protein 
interactions that are important for the signalling 
process (Anderson et al., 1990, Science 250:979-982). 
It may be that syngenes that encode a protein having 

25 binding domains specific for SH2 and/or SH3 domains 
will be especially useful for regulating the various 
signalling pathways of cells. 

In another embodiment, the syngenes, via 
expression of their encoded proteins, can be used to 

30 block or down-regulate the translation of a specific 
mRNA. Alternatively, the syngene can be used to 
stabilize or destabilize a specific RNA. In each of 
the aforementioned cases, the syngene encodes a 
protein that incorporates a binding domain that has 

35 been selected for binding to a portion of the RNA 

sequence that is to be down- regulated, stabilized or 
destabilized. 
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In another embodiment, a syngene is creat d 
whose product mimics the structure of a known hormone 
receptor but that has a different DNA binding site 
from that of the known hormone receptor. The DNA 
binding site of the known hormone receptor is replaced 
with a HSDB directed against a DNA binding site that 
represents a transcriptional regulatory site of a 
preselected gene. The effect of the use of such a 
syngene is that the syngene product (which mimics the 
hormone receptor) allows the preselected gene to be 
regulated by the hormone. The hormone binds to the 
syngene product and the syngene product -hormone 
complex is translocated to the nucleus of a cell where 
it binds to the transcriptional regulatory site of the 
preselected gene, thus activating or repressing the 
gene . 

Syngenes encoding binding domains directed 
to cyclins or cyclin dependent kinases can be used to 
inhibit the progression of cells through the cell 
cycle. Syngene products containing binding domains 
directed to DNA polymerases and associated factors 
involved in DNA replication can be used to stop or 
retard DNA synthesis, thus stopping or retarding cell 
division. 

Another potential target for syngene 
products of the present invention is the cellular 
trafficking machinery that directs or sorts proteins, 
for example, into the Golgi and to their final 
destinations. The proper subcellular localization of 
many proteins depends on short amino acid sequences 
present in the proteins. For example, the 
tetrapeptide KDEL (SEQ ID NO: IIS) targets proteins to 
the lumen of the endoplasmic reticulum. The 
tetrapeptide YQRI* (SEQ ID NO: 117), found in the 
membrane protein TGN38 of the trans Golgi network, has 
been demonstrated to be necessary and sufficient to 
target membrane proteins to the trans Golgi (Luzio and 
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Banting, 1993, Trends in Biochem. Sci. 18:395-398). 
Such peptide targeting sequences are appropriate 
ligands for use in isolating binding domains for 
incorporation into syngen s. 
5 Syngenes can also be useful in regulating 

the interactions of cells with the extracellular 
matrix. Such interactions depend on binding between 
the components of the matrix and receptors on the 
surface of cells. One of the most important 

10 components in certain types of matrices is the protein 
laminin. By interacting with specific receptors on 
the surface of cells, laminin serves as a bridge 
between the cells and the matrix (Martin and Timpl, 
1987, Ann. Rev. Cell Biol. 5:57-85). 

15 The interaction between laminin and its 

receptor has been demonstrated to be important in the 
migration of axons and dendrites of neurons. The 
axons and dendrites, guided by the specific 
interaction between laminin in the extracellular 

20 matrix and laminin receptors on the cell surfaces of 
the neurons, migrate along extracellular matrix 
pathways to their destinations (Bray and Hollenbeck, 
19B8, Ann. Rev. Cell Biol. 4:43-61). 

By screening a random peptide library for 

25 binding domains that are specific for laminin or its 
receptor, it is possible to isolate binding domains 
that can be incorporated into the peptides encoded by 
syngenes that are useful in modulating the process of 
axonal or dendrite migration as well as other 

30 processes that depend on the interaction between 
laminin and its receptor. 

The regulation of the intracellular 
movement of membranes and their contents in mammalian 
cells is another area where syngenes may be employed. 

35 At least seven classes of GTPases are thought to be 
involved in such membrane transport processes as 
movement of membrane vesicles from the endoplasmic 
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reticulum (ER) through the Golgi to the plasma 
membrane (or to intracellular organelles) and 
endocytosis. The seven classes of GTPases are: 
(1) Sar' proteins 
5 (2) Arf proteins 

(3) Rab proteins 

(4) Rac proteins 

(5) Rho proteins 

(6) heterotrimeric G proteins 
10 (7) dynamin 

All of these proteins have the ability to 
bind GDP or GTP. They cycle between a state in which 
they are bound to GDP and a state in which they are 
bound to GTP. Whether they are bound to GDP or GTP 
15 determines how these proteins will function with the 
general rule being that they are inactive when bound 
to GDP and active when bound to GTP. 

Sar proteins are believed to be involved in 
membrane vesicle budding from the ER and transport of 
20 the budded vesicles to the Golgi . a postulated 

mechanism envisions Sar proteins in their GDP-bound 
form interacting with a specific receptor on the 
transitional elements of the ER. Transitional 
elements are specialized sites for the export of newly 
synthesized proteins from the ER . This is followed by 
the binding of four other proteins (Sec 13, Sec 24, 
Sec 25, and a 105 kd protein) to the bound Sar to form 
a nascent vesicle. The nascent vesicle then buds off 
the ER in a process accompanied by the exchange of the 
bound GDP for GTP. This exchange is catalyzed by a 
Sar specific guanine nucleotide exchange factor (GEF) . 
The budded vesicle then moves to the Golgi where it 
fuses with the membrane of the cis Golgi network in a 
process that is dependent upon the hydrolysis of the 
35 bound GTP to GDP. This hydrolysis is catalyzed by a 
Sar specific GTPase activating protein (GAP) . See 
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Figure 2A of Nuoffer and Balch, 1994 , Ann. Rev. 
Biochem. 63:949-990. 

Arf proteins are believed to function in a 
manner similar to Sar proteins. But, whereas Sar 
5 proteins are involved in vesicle budding and transport 
between the ER and the Golgi, Arf proteins are thought 
to mediate transport between compartments of the 
Golgi . The mechanism of action of Arf is thought to 
involve binding of Arf (in its GDP -bound state) to a 

10 specific receptor on the membrane of a Golgi vesicle. 
This is followed by binding to Arf of coat proteins in 
a process that is dependent upon exchange of the bound 
GDP for GTP. This exchange is catalyzed by an Arf 
specific GEF. Movement of the vesicle between Golgi 

15 compartments then occurs, followed by docking of the 
vesicle at its target membrane in the Golgi. Finally, 
fusion of the vesicle with its target occurs in a 
process driven by hydrolysis of the bound GTP. This 
hydrolysis is catalyzed by an Arf specific GAP. See 

20 Figure 2B of Nuoffer and Balch, 1994, Ann. Rev. 
Biochem. 63:949-990. 

Sar proteins, Arf proteins, and their 
associated GEFs and GAPs are suitable targets for the 
action of syngenes. A random peptide library could be 

25 screened for binding domains that are specific for 
these proteins. Such binding domains could be 
incorporated into the peptides encoded by syngenes 
that would be useful for regulating membrane traffic 
from the ER to the Golgi. 

30 In contrast to Sar and Arf, dynamin does not 

seem to be involved in transport of membrane vesicles 
from the ER to the Golgi. Instead, dynamin seems to 
participate in the receptor mediated endocytotic 
pathway involving clathr in -coated vesicles. Mutants 

35 of dynamin that are unable to bind or hydrolyze GTP 

block receptor mediated endocytosis in mammalian cells 
(van der Bliek et al., 1993, J. Cell Biol. 122:553- 
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563; Herskovits t al., 1993, J. Cell Biol. 122: 565- 
578) . it has been proposed that dynamin may be 
involved in the rapid endocytosie of synaptic vesicles 
after exocytosis in nerve terminals (Robinson, et al., 
5 1993, Nature 365:163-166). Syngenes containing 

binding domains specific for dynamin could be used to 
regulate the process of endocytosis. 

Rab proteins are a large class of proteins 
(more than 30 members have been discovered in 
10 mammalian cells alone) that are involved in membrane 
trafficking, it has been possible to implicate 
specific members of the Rab class in specific aspects 
of membrane movement. For example, much evidence 
indicates that Rab 1 is involved in the early phases 
of the secretory pathway such as vesicle budding from 
the ER and vesicle movement from the ER to the cis 
Golgi (Peter et al., 1993. J. Cell Biol. 122:1155- 
1168; Plutner et al., 1991, j. cell Biol. 115:31-43; 
Tisdale et al., 1992, J. Cell Biol. 119:749-761; 
Plutner, et al., 1990, EMBO J . 10:7B5-792). Mutated 
Rab 2 proteins have been shown to be trans dominant 
inhibitors of membrane transport from the ER to the 
Golgi (Tisdale et al., 1992, J. Cell Biol. 119:749- 
761) . Rab 6 is found in a complex with a 62 kd 
protein in the membranes of the trans Golgi network. 
Antibodies to Rab 6 (or to the 62 kd protein) 
inhibited the budding of vesicles from the trans Golgi 
to the plasma membrane, suggesting that Rab 6 may. be 
involved in this process (Jones et al., 1993, J. Cell 
Biol. 122:775-788). As Section 6 . 1 and its 
subsections show, by using the methods of the present 
invention, one can isolate binding domains that can 
distinguish between closely related ligands. One can 
then incorporate those binding domains into syngenes. 
35 In this way, it should be possible to make syngenes 
containing binding domains that are specific for a 
single member of the Rab family. These syngenes can 
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then be used to modulate thoae aspects of membrane 
trafficking in which that member participates without 
affecting other aspects. 

Evidence supports the involvement of 
5 heterotrimeric G proteins in some aspects of membrane 
trafficking. For example, agonists and antagonists of 
G 0 (one of the subunits of heterotrimeric G proteins) 
have been shown to affect export from the ER 
(Schwaninger et al., 1992, J. Cell Biol. 119:1077- 

10 1096) . In PC12 and MDCK cells, G wl inhibits membrane 

budding from the trans Golgi while G M promotes budding 
(Leyte et al., 1992, Trends Cell Biol. 2:91-94; 
Pimplikar and Simons, 1993, Nature 362:456-458; Barr, 
et al., 1991, FEBS Lett. 294:239-243). Syngenes 

15 containing binding domains specific for the above- 
mentioned heterotrimeric G proteins would be useful in 
regulating these aspects of membrane movement. 

5.3. IDENTIFICATION OF SYNTHETIC 
20 SEQUENCES WHICH MEDT& TE BTNPTT^ 

When a peptide from a random peptide library 
has been identified as a binder for a particular 
target ligand of interest according to the methods of 
the invention, it may be useful to determine what 
25 region (s) of the expressed peptide sequence is (are) 
responsible for binding to the target ligand. Such 
analysis can be conducted at two different levels, 
i.e., the nucleotide sequence and amino acid sequence 
levels. 

30 B V molecular biological techniques it is 

possible to verify and further analyze a ligand 
binding peptide at the level of the oligonucleotides. 
First, the inserted oligonucleotides can be cleaved 
using appropriate restriction enzymes and religated 

35 into the original expression vector and the expression 
product of such vector screened for ligand binding to 
identify the oligonucleotides that encode the binding 
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region of the peptide. Second, the oligonucleotides 
can be transferred into another vector, e.g., from 
phage to phagemid or to P 340-1D or to pLamB plasmid. 
The newly expressed fusion proteins should acquire the 
5 same binding activity if the domain is necessary and 
sufficient for binding to the ligand. This last 
approach also assesses whether or not flanking amino 
acid residues encoded by the original vector (i.e., 
fusion partner) influence peptide binding in any 
10 fashion. Third, the oligonucleotides can be 
synthesized, based on the nucleotide sequence 
determined for the syngene in the library that encodes 
the binding peptide, amplified by cloning or PCR 
amplification using internal and flanking primers 
cleaved into two pieces and cloned as two half -syngene 
fragments. In the foregoing manner, the inserted 
oligonucleotides are subdivided into two equal halves. 
If the peptide domain important for binding is small, 
then one recombinant clone would demonstrate binding' 
and the other would not. if neither have binding, 
then either both are important or the essential 
portion of the domain spans the middle (which can be 
tested by expressing just the central region) . 

Alternatively, by synthesizing peptides 
corresponding to the deduced sequence of the binding 
peptide, the binding domains can be analyzed. First, 
the entire peptide should be synthesized and assessed 
for binding to the target ligand to verify that the 
peptide is necessary and sufficient for binding. 
30 Second, short peptide fragments, for example, 

overlapping 10-mers, can be synthesized, based on the 
amino acid sequence of the random peptide binding 
domain, and tested to identify those binding the 
ligand. 

35 In addition, in certain instances, linear 

motifs may become apparent after comparing the primary 
structures of different binding peptides from the 
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library having binding affinity for a target ligand. 
The contribution of thee motifs to binding can be 
verified with synthesized peptides in competition 
experiments (i.e., determine the concentration of 
5 peptide capable of inhibiting 50% of the binding of 
the phage to its target; IC 50 ) . Conversely, the motif 
or any region suspected to be important for binding 
can be removed from or mutated within the DNA encoding 
the random peptide insert and the altered displayed 

10 peptide can be retested for binding. 

Furthermore, once the binding domain of a 
random peptide has been identified, differently 
displayed binding domains can be created by isolating 
and fusing the binding domain of one random peptide to 

15 a new effector domain. The biologically or chemically 
active effector domain of the peptide can thus be 
varied. Alternatively, the binding characteristics of 
an individual peptide can be modified by varying the 
binding domain sequence to produce a related family of 

20 peptides with differing properties for a specific 
ligand. 

Moreover, in a method of directed evolution, 
the identified random peptides can be improved by 
additional rounds of random mutagenesis, selection, 

25 and amplification of the nucleotide sequences encoding 
the binding domains. Mutagenesis can be accomplished 
by creating and cloning a new set of oligonucleotides 
that differ slightly from the parent sequence, e.g., 
by 1-10%. Selection and amplification are achieved as 

30 described above. By way of example, to verify that 
the isolated peptides have improved binding 
characteristics, mutants and the parent phage, 
differing in their lfl£Z expression, can be processed 
together during the screening experiments. Alteration 

35 of the original blue-white color ratios during the 
course of the screening experiment will serve as a 
visual means to assess the successful selection of 
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enhanced binders. This process can go through 
numerous cycles. 



5-4. STRUCTURE OF_SYNfflare.Q 
The syngenes of the present invention are 
synthetic genes, that is, genes that are not derived 
from a naturally occurring genome. In one embodiment, 
syngenes are made up of, at least in part, 
combinations of genes encoding functional domains, 
such combinations not occurring in nature. Syngenes 
may be composed of totally synthetic gene sequences or 
combinations of natural and totally synthetic gene 
sequences. Syngenes encode a syngene product which is 
a protein having at least one functional domain. The 
functional domain is a binding domain with affinity 
for a ligand. In a preferred aspect, this binding 
domain is characterized by i)i t8 strength of binding 
under specific conditions, 2) the stability of its 
binding under specific conditions, and 3) its 
selective specificity for the chosen ligand. In 
addition the syngene may encode additional functional 
domains, for example an amino acid sequence that 
directs cellular trafficking of the syngene product to 
the desired compartment of the cell; e.g., the 
tetrapeptide KDEL (SEQ ID NO: 116), KKXX '(SEQ ID NO- 
118) or KXKXX (SEQ ID NO: 119) where X is any amino 
acid) , to target peptides to the lumen of the 
endoplasmic reticulum; or, e.g., the tetrapeptide YQRL 
(SEQ ID NO: 117) to target membrane peptides or 
secreted peptides to the trans Golgi network (s 
Luzio and Banting, 1993, TIBS 18:395-398); or a 
mitochondrial targeting pro-sequence (Singh et al. 
1990, Biochem. Biophys. Res. Commun. 169:391-396). 
third functional domain may be one that enhances 
35 activity of the syngene product, for example a 
transcriptional activation sequence. 
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The syngen s thus optionally encode 
additional sequences besides the encoded binding 
peptide. Such additional sequences can be markers , 
linkers, transcriptional activation signals, 
5 intracellular localization signals, in vivo targeting 
sequences, processing signals, post-translational 
modification signals, and the like. The syngene 
preferably includes regulatory sequences that control 
the expression of the coding region of the syngene, 

10 nucleic acid coding sequences encoding the binding 
peptide, and optional sequences. 

In a preferred embodiment, the syngenes of 
the invention are created by inserting nucleotides 
encoding synthetic binding domains discovered by 

15 screening libraries as described herein into the 

context of additional protein coding sequences. This 
can be done by replacing the DNA binding domain of a 
known protein or by assembling a totally synthetic 
gene. In addition to other protein coding sequences, 

20 the syngene will generally contain other non-protein 
coding sequences. These non-protein coding sequences 
will generally be involved in the control of the 
expression of the syngene. At a minimum, the syngene 
preferably contains sequences for initiating 

25 transcription and translation, as well as 

transcriptional processing signals such as a poly A 
addition signal. The syngene preferably also contains 
a translation termination signal. Signals for 
splicing of the RNA transcript encoded by the syngene 

30 may also be desirable. The addition of nucleotide 

sequences in the syngene other than those encoding the 
binding domain can provide for in vivo or 
intracellular targeting of the syngene or its encoded 
product or transcriptional and/or post-translational 

35 processing events. Thus, in a specific embodiment, 
the syngene comprises a promoter operably linked to 
the coding sequences, followed by (3 1 ) transcriptional 
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termination and poly A addition signals. Splice donor 
and acceptor sequenc s and sequences affording 
stability to the construct can also be added. 
According to one embodiment of the invention, the 
5 encoded syngene product can contain an additional 

region, i.e., an amino acid sequence that functions as 
a linker domain between one or more of the other 
domains of the syngene. The presence or 
the linker domain is optional, as is the type of 
10 linker that may be used. 

The syngene is preferably incorporated into 
a vector, for replication, and/or expression of the 
syngene. Any vector known in the art can be used. In 
addition to comprising the syngene, the vector 
comprises appropriate origin (s) of replication, and. 
preferably, a selectable marker allowing selection, of 
cells expressing the syngene product. 

In a specific embodiment, syngene products 
are hybrid proteins. The amino terminus will 
generally consist of a methionine residue followed by 
a spacer sequence of 1-15 amino acids, followed by one 
or more binding domains (i.e., the sequences 
identified by screening the peptide library against 
the ligand of choice) . The middle domain of the 
hybrid protein will often consist of a marker that 
aids in pre-clinical characterization, e.g. monitoring 
expression of the syngene in vitro, as in, for 
example, tissue culture cells. This is likely to be a 
readily detectable epitope or other label, e.g., p- 
galactosidase, lucif erase or chloramphenicol acetyl 
transferase genes. This marker generally will be 
deleted when the syngene is used clinically. A 
cellular trafficking domain (e.g., KDEL (SEQ ID NO: 
116) or YQRL (seq ID NO: 117) may next appear; a 
35 nuclear localization signal (NLS) at the carboxy 

terminus is also preferably part of the hybrid protein 
when it is desired that the syngene product be 
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targeted to the nucleus. Additional functional 
domains, e.g., transcriptional activation sequences, 
or spacers can alBO be included. Of course, it will 
be readily apparent to one skill d in the art that the 
5 order in which the binding domains, the marker, the 
trafficking domain, and additional functional domains 
appear in the syngene may not be critical. The order 
disclosed above is merely illustrative of one 
particular possible order. It will also be apparent 

10 that, except for the ligand binding domain, the 

presence of any other domains in a syngene product is 
optional. An exemplary mRNA expressed by a syngene is 
shown in Figure 2 . The amino terminal sequence is 
preferably in the range of 3-10 amino acids. The 

15 binding domain mediates binding to the ligand of 
choice. The "middle domain" can be a cellular 
trafficking signal such as KDEL (SEQ ID NO: 116) or 
YQRL (SEQ ID NO: 117), or is a marker (label) domain. 
The additional functional domains (optional) can be 

20 transcriptional activation sequences, NLSs, or 
linkers. 

If the syngene product is designed to 
activate the transcription of a gene, then the syngene 
will, in addition to the domains discussed above (or 

25 instead of the marker domain) , contain a 

transcriptional activation domain. Examples of such 
domains are: a homopolymeric stretch of glutamine 
residues, an acidic activation domain (sometimes known 
as a "negative noodle"), a proline-rich domain, or a 

30 serine -threonine -rich domain. It has been 

demonstrated that when such activation domains are 
directed to the DNA target, they lead to 
transcriptional activation of that target (Gerber et 
al., 1994, Science 263:808-811; Mitchell and Tjian, 

35 1989, Science 245:371-378; Sadowski et al., 1988, 
Nature 335:563-564; Ptashne and Gann, 1990, Nature 
346:329-331) . 
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A nuclear localization signal, such as that 
of the nucleoplasmin gene, may be added. The 
nucleoplasmin nuclear localization signal has been 
shown to function in a variety of contexts, where it 
leads to the nuclear localization of proteins in which 
it appears. Nuclear localization signals that can be 
used in the present invention are not limited to those 
specifically disclosed herein; they include any that 
are known in the art. In a particular embodiment, the 
amino acid coding sequence of a syngene product begins 
with an amino- terminal spacer sequence,, for example 
MGASGAS (SEQ ID NO: 120), followed by a nuclear 
localization sequence which may be KRPAATKKAGQAKKKKR 
(SEQ ID NO: 121) (the nucleoplasmin NLS. see Rang et 
15 al., 1994, Proc. Natl. Acad. Sci. USA 91:340-344) 
followed by a second spacer sequence, for example, 
GASGAS (SEQ ID NO: 122), followed by the binding 
domain, in another embodiment, the amino acid 
sequences of a syngene product begins with an amino- 
terminal spacer sequence, followed by the binding 
domain, followed by a second spacer sequence, followed 
by the carboxy- terminal nuclear localization signal 
MISESLRKAIGKR (SEQ ID NO: 123) (Shieh et al . , 1 993 , 
Plant-Physiol. 101:353-61). m another embodiment.' 
the NLS of the p50 subunit of NF-<B, VQRLRQLM (SEQ ID 
NO: 124) is used (Henkel et al., 1992, Cell 68:1121- 
1133) . Other nuclear localization signals which can 
be used as part of a syngene product include but are 
not limited to those found in various steroid hormone 
receptors, the SV40 large T antigen, or the consensus 
sequence therefor, as set forth in Table 7 below. 
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Comparison of Amino Acid Sequences of (Pot ntial) 
Nuclear Localization Signals in Various Steroid 



5 


Protein 


Species 


Sequence 


SEQ 

ID 

NO: 




SV40 large T 
antigen 




12 G PKKKRKV 
PKKKRKV 


125 
126 


10 


Progesterone 
receptor 


rabbit 

human 

chicken 


638 (+10) RKFKKFNK 

637( + 10) 

491 (+10) L. . 


127 
128 




Glucocorticoid 
receptor 


human 
mouse 
rat 


491 (+10) RKTKKKIK 

498 (+10) 

510 ( + 10) 


129 


15 


Androgen receptor 


human 
rat 


628 (+10) RKLKKLGN 
612 ( + 10) 


130 




Mineralocorticoid 
receptor 


human 


673 (+10) RKSKKLGK 


131 


20 


Estrogen receptor 


human 

rat 

mouse 

chicken 

Xenopus 


256 (+11) RKDRRGGR 

26K + 11) 

260 ( + 11) 

250 (+11) E 

25K+11) 


132 
133 




Consensus sequence 




RR R 
RKD. . GG. 

KK K 


134 
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30 



35 



* Taken from Gulochon-Maniel et al., 1989, Cell 
57:1147-1154. The numbers refer to the amino acid 
position of the first amino acid. In the case of 
steroid receptors, the distance of this amino acid 
from the last conserved cysteine in the DNA binding 
region is indicated in brackets. 

According to specific embodiments of the 
invention, the syngene can encode multiple copies of 
one or more binding domains or additionally contain 
multiple non-binding domains. 

Additionally, the syngene can be modified at 
the base moiety, sugar moiety, or phosphate backbone, 
to stabilize the syngene, promote in vivo transport or 
localization, etc. Such modifications to bases, sugar 
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moieties, and phosphate backbones are made in such a 
way that the modifications do not interfere with the 
transcription of the syngene, and thus do not 
interfere with the expression of the syngene product. 
In a preferred aspect, such modifications are in a 
noncoding region or at a 5' or 3 ' terminus of the 
syngene. For example, the syngene may include other 
appended groups such as peptides, or agents 
facilitating transport across the cell membrane (see, 
e.g., Letsinger et al . , 1989, Proc. Natl. Acad. Sci. ' 
U.S.A. 86:6553-6556; Lemaitre et al . , 1987, Proc. 
Natl. Acad. Sci. 84:648-652; PCT Publication No 
WO 88/09810, published December 15, 1988) or blood- 
brain barrier (see, e.g., PCT Publication No 
WO 89/10134, published April 25, 1988), hybrid! zat ion - 
triggered cleavage agents (see, e.g., Krol et al., 
1988, BioTechniques 6:958-976) or intercalating agents 
(see, e.g., 2on, 1988, Pharm. Res. 5:539-549). 

The syngene may comprise at least one 
modified base moiety which is selected from the group 
including but not limited to 5-f luorouracil, 
5-bromouracil, 5-chlorouracil, 5-iodouracil 
hypoxanthine, xantine, 4-acetylcytosine, 
5 - ( carboxyhydroxylme thy 1 ) urac i 1 , 

5-carboxymethylaminomethyl -2 - thiouridine , 

5 - carboxymethylaminomethyluracil , dihydrouracil , 

beta-D-galactosylqueosine, inosine, 

N6 -isopentenyladenine , 1 -methylguanine , 

1- methylinosine, 2. 2-dimethylguanine, 2-methyladenine, 

2- methylguanine, 3-methylcytosine, 5-methylcytosine, 
N6- adenine, 7 -methylguanine, 

5 -methyl aminomethyluracil , 5-methoxyaminomethyl- 

2 - thiouracil , bet a -D-mannosylqueosine , 

5 • -methoxycarboxymethyluracil . 5-methoxyuracil , 

2-methylthio-N6-isopentenyladenine, uracil- 5 -oxyacetic 

acid (v), wybutoxosine, pseudouracil, queosine, 

2-thiocytosine, 5 -methyl-2 -thiouracil, 2-thiouracil , 
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4-thiouracil, 5-methyluracil, uracil- 5 -oxyace tic acid 
methyl ster p uracil- 5 -oxyacetic acid (v) , 5-methyl- 
2-thiouracil, 3- (3-amino-3-N-2-carboxypropyl) uracil, 
(acp3)w r and 2, 6-diaminopurine. 
5 In another embodiment, the syngene comprises 

at least one modified sugar moiety selected from the 
group including but not limited to arabinose, 
2-fluoroarabinose, xylulose, and hexose . 

In yet another embodiment, the syngene 

10 comprises at least one modified phosphate backbone 
selected from the group consisting of a 
phosphorothioate, a phosphorodithioate, a 
phosphoramidothioate, a phosphoramidate, a 
phosphordiamidate, a methylphosphonate, an alkyl 

15 phosphotriester, and a formacetal or analog thereof. 

In yet another embodiment, a portion of the 
syngene (preferably noncoding) consists of cr-anomeric 
nucleotides. An a-anomeric oligonucleotide forms 
specific double -stranded hybrids with complementary 

20 RNA in which, contrary to the usual 0-units, the 

strands run parallel to each. other (Gautier et al., 
1987, Nucl. Acids Res. 15:6625-6641). 

Syngenes of the invention may be synthesized 
by standard methods known in the art, e.g. by use of 

25 an automated DNA synthesizer (such as are commercially 
available from Biosearch, Applied Biosystems, etc.). 
As examples, phosphorothioate oligonucleotides may be 
synthesized by the method of Stein et al. (1988, Nucl. 
Acids Res. 16:3209), methylphosphonate 

30 oligonucleotides can be prepared by use of controlled 
pore glass polymer supports (Sarin et al., 1988, Proc. 
Natl. Acad. Sci. U.S.A. 85:7448-7451), etc.. 
Alternatively, the syngenes are made by recombinant 
DNA techniques commonly known in the art, e.g. by 

35 replicating a vector comprising the syngene in a 
suitable host cell . 

- 108 - 



WO 96/06188 



PCT/US95/1052J 



15 



In a specific embodiment, the syngene 
comprises a 2 ' -O-methylribonucleotide (Inoue et al., 
1987, Nucl. Acids Res. 15:6131-6148), or a chimeric 
RNA-DNA analogue (Inoue et al., 1987, FEBS Lett. 
5 215:327-330). 

5.5. EXPRESS TOT J OF SVNftBfl flp 

It will be apparent to one skilled in the 
art that many satisfactory gene arrangements can be 
10 made such that the binding domains of the syngenes of 
the present invention can be expressed in the desired 
cell type in vitro or in vivo (see generally, Ausubel 
et al. (eds.), 1993, Current Protocols in Molecular 
Biology, John Wiley & Sons, Inc., N.Y.; Kriegler, 
1990, Gene Transfer and Expression, A Laboratory 
Manual, Stockton Press, N.Y.). in a specific 
embodiment, the syngene is constructed as part of an 
expression vector that can be introduced in vivo such 
that it is taken up by a cell, within which cell the 
vector or a portion thereof is transcribed, leading to 
the production of the encoded syngene product. Such a 
vector can remain episomal or become chromosomally 
integrated, as long as it can be transcribed to 
produce the syngene product. Such vectors can be 
25 constructed by recombinant DNA technology methods 

standard in the art. Vectors can be plasmid, viral, 
or others known in the art, used for replication and 
expression in mammalian cells. Expression of the 
sequence encoding the syngene product can be by any 
promoter known in the art to act in mammalian, 
preferably human, cells. Such promoters can be 
inducible or constitutive. Such promoters include but 
are not limited to: the SV40 early promoter region 
(Bernoist and Chambon, 1981, Nature 290:304-310), the 
35 promoter contained in the 3' long terminal repeat of 
Rous sarcoma virus (Yamamoto et al., 1980, Cell 
22:787-797), the herpes or HIV thymidine kinase 
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promoter (Wagner et al., 1981, Proc. Natl. Acad. Sci. 
U.S.A. 78:1441-1445), the regulatory sequences of the 
metallothionein gene (Brinster et al., 1982, Nature 
296:39-42), cytomegalovirus (CMV) promoter, actin 
5 promoter, phosphoglycerate kinase promoter, etc. For 
example, promoters that function in a wide variety of 
cell types can be used to obtain expression in a wide 
variety of cell types where the syngene is introduced. 
This may be done by using a viral promoter such as a 

10 human cytomegalovirus early gene promoter or the 

adenovirus major late promoter to drive the expression 
of the syngene. Such viral promoters are active in a 
wide variety of mammalian cell types. Alternatively, 
it is possible to use a construct whereby the syngene 

15 is transcribed off of a constitutive mammalian cell 

promoter. Examples of such constitutive promoters are 
actin promoters and the PGK (phosphoglycerol kinase) 
promoter. Since the latter promoter is functional for 
high level transcription in all living mammalian cells 

20 it is an especially preferred choice. 

In another embodiment the syngene is 
constructed such that it can be expressed specifically 
or substantially specifically in a specific tissue or 
specific cell type. Numerous tissue and cell specific 

25 promoters and control regions have been characterized 
and it will be apparent to one skilled in the art how 
to select the appropriate one for expression in the 
desired cell type. For example, the following tissue 
specific control regions may be used: elastase I gene 

30 control region which is active in pancreatic acinar 

cells (Swift et al., 1984, Cell 38:639-646; Ornitz et 
al., 1986, Cold Spring Harbor Symp. Quant. Biol, 
50:399-409; MacDonald, 1987, Hepatology 7 : 425-515) ; 
insulin gene control region which is active in 

35 pancreatic beta cells (Hanahan, 1985, Nature 315:115- 
122) , immunoglobulin gene control region which is 
active in lymphoid cells (Grosschedl et al., 1984, 
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. Cell 38:647-658; Adames et al.. 1985, Nature 318:533- 
538; Alexander et al., i 98 7, Mol. Cell. Biol. 7:1436- 
1444), mouse mammary tumor virus control region which 
is active in testicular, breast, lymphoid and mast 
cells (Leder et al., 1986, Cell 45:485-495). albumin 
gene control region which is active in liver (Pinkert 
et al., 1987, Genes and Devel. 1:268-276), alpha- 
fetoprotein gene control region which is active in 
liver (Krumlauf et al., 1985, Mol. Cell. Biol. 5:1639- 
1648; Hammer et al., 1987, Science 235:53-58; alpha 1- 
antxtrypsin gene control region which is active in the 
liver (Kelsey et al . , 1987, Genes and Devel. 1:161- 
171), beta-globin gene control region which is active 
xn myeloid cells (Mogram et al., 19 85, Nature 315:338- 
340; Kollias et al., 19 86, Cell 46:89-94; myelin basic 
protein gene control region which is active in 
oligodendrocyte cells in the brain (Readhead et al 
1987, cell 48:703-712); myosin light chain-2 gene " 
control region which is active in skeletal muscle 
(Sam, 1985. Nature 314:283-286), and gonadotropic 
releasing hormone gene control region which is active 
in the hypothalamus (Mason et al . . 1986, Science 

234:1372-1378) . 

It will be apparent to one skilled in the 
art that numerous diverse vectors can be used for the 
delivery and subsequent expression of syngenes within 
mammalian cells, simple shuttle vectors may be used 
such that a plasmid is grown in E. coli and purified 
DNA transferred into mammalian cells by the use of 
electroporation, calcium phosphate precipitates, DEAE 
dextran or with the assistance of liposomes. Such an 
exemplary shuttle vector is shown in Figure 3. 
Furthermore many excellent viral based vectors can be 
used depending upon the therapeutic application. For 
instance herpes simplex virus may be used to deliver 
genes to the brain (Leib and Olivo, 1993, Bioessays 
15:547-54), retroviruses (Salmons and Gunzburg, 1993, 
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Human Gene Therapy 4:129-142) and adeno- associated 
virus (Walsh et al . , 1993, Proc. Soc. Exp. Biol, Med. 
204:289-300), which have the advantage of becoming 
integrated into the cellular DNA, may b used to 
5 deliver syngenes via transduction into hematopoietic, 
pluripotent stem cells and other somatic cells. 

One of skill in the art would recognize that 
the underlying DNA sequences of the protein coding 
regions of syngenes may be varied so as to take 

10 advantage of known codon usage frequencies in 

different organisms. The degeneracy of the genetic 
code allows for a plurality of DNA sequences to be 
constructed that all code for the same peptide or 
protein in a syngene. In some cases, for example, it 

15 may be desirable to introduce changes in the DNA 

sequence of the syngene encoding the binding domain in 
order to increase the expression of that binding 
domain in the cell in which the syngene is to be used. 
To do this, one would consult any well known table of 

20 codon frequency usage, such as that found in Ausubel 
et al. (eds.), 1993, Current Protocols in Molecular 
Biology, John Wiley & Sons, Inc., N.Y., at Appendix 1 
A. 1.6. Using the information found there, one would 
use any well known method in the art to change the 

25 codons of the binding domain of the syngene that 

happen to be little used in the organism in which the 
syngene is to be expressed into codons that are more 
common in that organism. 

30 5.6. ASSAY FOR THE ABILITY OF A 

SYNGENE PRODUCT TO ACTIVATE 
OR INHIBIT TRANSCR IPTION 

Syngenes that encode binding domains that 
bind to a DNA transcriptional regulatory region, but 
that do not contain a transcriptional activation 
35 signal, can be tested for their ability to inhibit the 
transcription of a gene by the following assay, 
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presented by way of example but not limitation. The 
assay is carried out by using three sets of plasmids 
in an established cell line. The cell lin expresses 
a transcription factor that has a DNA binding site 
5 that is the same as the DNA binding site of the 
syngene product that is to be assayed. The three 
plasmids have the following characteristics: 

Plasmid 1: This plasraid contains the syngene 
to be tested cloned into a vector that will direct the 
10 expression of the syngene in the cell line; 

Plasmid 2: This plasmid contains a reporter 
gene. The promoter of the reporter gene contains the 
DNA sequence which forms the binding site for the 
transcription factor expressed by the cells and for 
15 the binding domain encoded by the syngene. Normally, 
this DNA sequence will be essentially the same 
sequence that was used to screen the library in the 
process of obtaining the binding domain encoded by the 
syngene. The reporter gene codes for the expression 
of any detectable marker such as, for example, 
chloramphenicol acetyl transferase (CAT) , 
galactosidase, horse radish peroxidase, etc., under 
the control of the promoter. 

Plasmid 3: This is a control plasmid to 
25 monitor the efficiency of transf ection. It directs 

the expression of a second reporter gene expressing a 
product that can be easily assayed. The activity of 
the promoter that drives the expression of this 
product is not dependent upon binding of a 
30 transcription factor to the binding site of the 

syngene encoded binding domain that is being assayed. 
Preferably, the promoter is constitutive in the cell 
being used. Alternatively, it is an inducible 
promoter, and the assays described below are carried 
35 out in the presence of its inducer. The reporter gene 
in plasmid 3 is different from that in plasmid 2. 



20 
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Two procedures are done, consisting of a 
test procedure and a control procedure. In the test 
procedure/ the three plasmids are simultaneously co- 
introduced into each cell. Introduction can be by any 
5 of the well known methods in the art. Such methods 
include electroporation, calcium phosphate mediated 
transf ection, and liposome mediated transfection* 
Preferably, introduction is by electroporation. In 
the control procedure, plasmids 2 and 3 are co- 
10 introduced, without plasmid 1 (which contains the 
syngene) . 

After culture for a time sufficient to allow 
expression of the products encoded by the plasmids, 
the amounts of expressed reporter gene products from 

15 plasmids 2 and 3 are measured. If the syngene product 
is able to compete with the transcription factor for 
binding to the DNA binding sequence in the promoter of 
the first reporter gene on plasmid 2, then the level 
of expression of the first reporter gene product will 

20 be low. This is because binding of the syngene 

product to the DNA binding sequence in the promoter of 
the first reporter gene will prevent binding of the 
transcription factor to that sequence. Therefore, the 
transcription factor will not be able to activate the 

25 transcription of the first reporter gene. If the 
syngene product is not able to compete with the 
transcription factor, then the level of expression of 
the first reporter gene product will be high. This 
will be the case in the control procedure, where 

30 plasmid 1, which contains the syngene, is not used. 

The level of expression of the first 
reporter gene product (from plasmid 2) is compared to 
the level of expression of the second reporter gene 
product (from plasmid 3) to determine the ratio of 

35 expression of the two products. This ratio is 

determined for both the test procedure with, and the 
control procedure without, plasmid 1 containing the 
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syngene. When the ratio from the test procedure is. 
less that the ratio from the control procedure, this 
indicates that the syngene product is competing with 
the transcription factor for binding to the DNA 
5 binding sequence in the promoter of the first reporter 
gene. When the ratio from the test procedure is equal 
to the ratio from the control procedure, this 
indicates that the syngene product is unable to 
compete with the transcription factor^ 
10 By way of example, to determine if a syngene 

product can inhibit the transcription of a gene the 
transcription of which depends upon binding of the 
transcription factor NF-*B to its DNA binding site, a 
human cell line that expresses NF-kB (p65) is used 
15 (e.g., Jurkat, a T cell line; Okamoto et al., 1994, J. 
Biol. Chem. 269:8582-8587). The first plasmid 
contains the syngene which is driven by the adenovirus 
major late promoter or by some other promoter that is 
active in the cells being used. The second plasmid 
20 contains the first reporter gene which consists of the 
H2*b promoter (Baldwin and Sharp, 1988, Proc. Natl. 
Acad. Sci. USA 85:723-727) directing the expression of 
the CAT gene. The H2«b promoter is dependent upon NF- 
kB binding for its activity. The control plasmid 
25 consists of the luciferase gene (the second reporter 
gene) driven by a promoter that does not depend upon 
NF-kB for its expression. 

The three plasmids are introduced into the 
cells via electroporation. When the syngene that is 
30 being assayed expresses a binding domain that is 

specific for the NF-*B binding site of the H2*b gene 
but does not also contain a transcriptional activation 
site, then the ratio of CAT activity to luciferase 
activity will be low when compared to the same ratio 
35 obtained when no syngene, or a control syngene that 
does not encode a binding site that is specific for 
the NF-kB binding site, is used. 
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This is because the syngene that encodes a 
binding domain that specifically binds the NF-kB 
binding site will bind the NF-kB site in the reporter 
plasmid containing the CAT gene. This will prevent 
5 binding of the cell's endogenous NF-«B transcription 
factor to that site, thus preventing NF-*B from 
activating the promoter of the reporter gene. 

To determine whether a syngene -encoded 
protein product can activate the transcription of a 

10 gene, the same three plasmids can be used as are 

described above in the test and control procedures. 
The cell line used in the assay for activation 
differs, however. To test for activation, a cell line 
must be used that does not express the transcription 

15 factor whose binding site is the same as that of the 
syngene product. Because the cell line does not 
express the transcription factor, the first reporter 
gene on the second plasmid will not be transcribed in 
the absence of a syngene product. If the syngene 

20 product is capable of binding to the DNA binding 

sequence in the promoter of the first reporter gene 
and is also capable of activating transcription, the 
reporter gene on the second plasmid will be expressed. 
In this assay, the syngene to be tested will 

25 preferably encode a transcriptional activation signal 
in addition to a binding domain. 

Once again, the ratio of first reporter gene 
product over second reporter gene products is compared 
for the test and control procedures (with and without 

30 the syngene) . In this case, however, the presence in 
the test procedure of a syngene that encodes a product 
that is capable of binding to the promoter of, and 
activating the transcription of, the first reporter 
gene will result in a larger value for the ratio in 

35 the case of the test procedure as compared to the 
control procedure. The ratio for the control 
procedure is .smaller, reflecting the inactivity of the 
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promoter of the first reporter gene in the absence of 
the transcription factor and the syngene product. 

As a specific example, to determine whether 
a syngene can activate the transcription of a gene the 
5 transcription of which depends upon the binding of the 
transcription factor NF-kB to its DNA binding site, a 
cell line is used that substantially does not express 
NF-kB. HeLa cells (Scheinman et al., 1993 , Mol. Cell. 
Biol. 13:6089-6101) express only low levels of 

10 endogenous NF-*B and thus can be used. Alternatively, 
F9 embryonal carcinoma cells can be used. The same 
three plasmids as above are co-transf ected into this 
cell line. If the syngene product contains a binding 
domain specific for the NF-kB site as well as a 

15 transcriptional activation domain, the ratio of CAT 
activity to luciferase activity will be high when 
compared to the same ratio obtained when no syngene, 
or a control syngene product that lacks a binding 
domain specific for the NF-kB site, is used. When no 

20 syngene, or a control syngene, is used, CAT activity 
will be very low because the cell, lacking NF-kB 
activity, will transcribe the H2«b promoter (which 
drives expression of the CAT gene) at a very low rate. 
Thus, there will be very little CAT activity unless 

25 the syngene to be assayed binds to the NF-kB site on 
the H2*eb promoter and activates expression of the CAT 
gene . 

If a cell line that does not express a 
transcription factor of interest is not readily 

30 available, it may be made by methods known in the art. 
For example, the gene for the transcription factor of 
interest may be inactivated by promoting homologous 
recombination of that gene with a non- functional copy 
of itself (Roller and Smithies, 1989, Proc. Natl. 

35 Acad. Sci. USA 86:8932-8935; Zijlstra et al., 1989, 
Nature 342:435-438). 
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A preferred vector for expression of a 
syngene for characterization of its biological 
activity is that described by Pei et al . # 1991, J. 
Biol. Chem. 266:9598-9604. This vector consists of an 
5 adenovirus major late promoter driving a gene cassette 
with the adenovirus tripartite leader sequence and a 
hemoglobin splice donor and acceptor followed by the 
desired coding sequences followed by the 
polyadenylation site and 3' non-coding sequences from 

10 the SV4 0 viru6 early transcriptional unit. In 

addition, this vector has the pBR322 replicon and an 
antibiotic resistance marker for growth in E. coli. 

Although the above discussion was couched in 
terms of NF-kB binding sites, it will be readily 

15 understood by one of skill in the art that the same 

general method of testing the ability of a syngene to 
activate or inhibit transcription can be extended to 
test the activity of syngene products that bind to 
various other transcription factor binding sites, or 

20 that bind to a ligand of choice and thereby activate 
or inhibit an activity that is detectable in the 
assay. 

5.7. DIFFERENTIAL REGULATION OF CLOSELY. 
25 RELATED TRANSCRIPTI ONAL REGULATORY SITES 

It is well known that the DNA sequences that 
define the binding sites of many transcription factors 
differ somewhat from gene to gene. Thus, the NF-xB 
binding site in the HLA class I genes differs from the 
NF-kB site in the IL-6 gene or the IL-8 gene 
(respectively termed IL-6*B and IL-8*B) . See Section 
6.1.1. While, for some transcription factors, 
slightly different sites are recognized by slightly 
different versions of the transcription factors, in 
2j some cases the same transcription factor recognizes 
the somewhat different versions of its binding site.. 
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In those cases where the same transcription 
factor recognizes different binding sites, the use of 
the syngenes of the present invention affords the 
opportunity for achieving a level of regulation of the 
5 different sites that surpasses that which could be 
achieved by the use of the natural transcription 
factor. This is because it is possible to isolate 
highly specific DNA binding domains from random 
synthetic peptide libraries according to the present 
10 invention that bind specifically to one member of the 
related binding sites of a given transcription factor 
but that do not bind to the other sites. This is done 
by screening a random synthetic peptide library with a 
first oligonucleotide ligand that represents the DNA 
15 binding site that it is desired to specifically 

regulate. After isolating a group of binding domains 
that bind to this first oligonucleotide, one then 
makes use of other oligonucleotides that are closely 
related, but slightly different from the first 
20 oligonucleotide. By selecting against binding domains 
that will bind these other oligonucleotides (by use of 
the methods described in Section €.1 and its 
subsections), one obtains a highly specific DNA 
binding domain, i.e. a binding domain that 
25 specifically binds only to the first oligonucleotide, 
and not to the others. Using this approach, for 
example, it is possible to regulate the activity of 
the IL-6 gene (by use of a syngene expressing an HSDB 
specific to the NF-zcB site of the IL-6 gene) without 
30 affecting the activity of the IL-8 gene. 

5.8. USES OF SYNGENES 

The invention also provides methods for use 
of the syngenes, e.g., in diagnosis and therapy of 
35 various disorders. 

The present invention vastly expands the 
number of geoes that are available for use in gene 
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therapy. The present invention provides methods for 
finding synthetic nucleic acids encoding proteins with 
diagnostic or therapeutic value, or valu in in vitro 
assays. The present invention further provides 

* 

5 methods for delivery of such peptides or proteins in 
vivo by expression in vivo from the administered 
syngene . 

The present invention provides methods for 
using synthetic peptides or proteins, and the genes 

10 encoding them, to fulfill roles that naturally 

occurring proteins have not necessarily been evolved 
for. One need only identify a target whose activity 
it would be desirable to modulate. Given such a 
target, a synthetic peptide in a random peptide 

15 library that binds the target and can thus modulate 

the target's activity can be readily identified by the 
present invention. The present invention also 
provides for the identification of a nucleic acid 
(syngene) that encodes the peptide. Such a syngene 

20 can be used, e.g., in gene therapy. As an example, 

the present invention provides methods for identifying 
syngenes that inhibit or enhance the transcriptional 
activity of a wide variety of naturally occurring 
genes, preferably with a specificity not found in 

25 natural systems. In a specific embodiment, syngenes 
can be used to effect differential regulation of 
closely related transcriptional regulatory sites as 
described in Section 5.7. 

In addition to affecting transcription, 

30 among the other nuclear processes that syngenes may be 
useful in modulating are: post -transcriptional 
processing of RNA, DNA repair, and DNA replication. 
Syngene products may bind to actin and thus be of use 
in regulating mitotic spindle formation and cell 

35 division. 

Syngenes are also useful for modulating 
processes that occur outside of the nucleus . For 
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example, in the cytoplasm, the encoded protein of a 
syngene, produced by intracellular expression of th 
syngene, may be used to affect the activity of 
inhibitors of transcription factors such as IF-kB. 
5 They may also be used to modulate signal transduction 
pathways, metabolic pathways, RNA translation, and 
intracellular trafficking. In the cell membrane, 
syngenes may be used to modulate the activity of 
membrane receptors, ion channels, or exocytotic and 

10 endocytotic pathways. 

In tissue, syngenes, via expression of their 
encoded proteins, may be used to regulate cell/cell 
signalling and transcytosis . Cell/cell junctions and 
the extracellular matrix are appropriate targets for 

15 syngenes. Syngenes may be used to regulate cell 

adhesion or cell/cell recognition. In the general 
circulation, syngenes may be used to regulate the 
activity of receptor ligands. 

Syngenes can be used in any appropriate 

20 method of gene therapy, as would be recognized by 
those in the art upon considering this disclosure. 
The resulting action of a syngene in the gene therapy 
patient can, for example, lead to the activation or 
inhibition of a preselected gene in the patient, thus 

25 leading to improvement of the diseased condition 

afflicting the patient. Methods of gene therapy are 
detailed in Section 5.8.1. 

Alternatively, the syngenes may be 
introduced into appropriate host cells and thereby 

30 used for the recombinant production of their encoded 
peptides. The peptides thus produced can be used, 
e.g., in in vitro assays to detect and/or quantitate 
the ligand of choice to which they bind, for 
therapeutic purposes, diagnostic purposes, or to 

35 regulate transcription (e.g. , where the ligand of 
choice modulates transcription) . 
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In another aspect, the invention relates to 
the peptide products of syngenes and their therapeutic 
and diagnostic uses. The syngene- encoded peptides can 
be used therapeutically and diagnoetically, to 
5 regulate transcription, signal transduction, etc. as 
described herein for syngenes . The syngene products 
can be used in in vitro binding assays, to detect 
and/or measure amounts of their binding partners 
(e.g., the ligand of choice). The syngene products 
10 can be used in vivo, e.g., labeled with an appropriate 
marker, to image their binding partners. 

5.6.1. GENE THERAPY 

In its broadest sense, gene therapy refers 

15 to therapy performed by the administration of a 

nucleic acid to a subject. The nucleic acid, either 
directly or indirectly via its encoded protein, 
mediates a therapeutic effect in the subject. 

The syngenes of the present invention can be 

20 used in any of the methods for gene therapy available 
in the art . The descriptions below are meant to be 
illus'trative of such methods. It will be readily 
understood by those of skill in the art that the 
methods illustrated represent only a sample of all 

25 available methods of gene therapy. 

For general reviews of the methods of gene 
therapy, see Goldspiel et al., 1993, Clinical Pharmacy 
12:488-505; Wu and Wu, 1991, Biotherapy 3:87-95; 
Tolstoshev, 1993 , Ann . Rev . Pharmacol . Toxicol . 

30 32:573-596; Mulligan, 1993, Science 260:926-932; and 
Morgan and Anderson, 1993, Ann. Rev. Biochem. 62:191- 
217; May, 1993, TIBTECH 11 (5) : 155-215) . Methods 
commonly known in the art of recombinant DNA 
technology which can be used are described in Ausubel 

35 et al . (eds.), 1993, Current Protocols in Molecular^ 
Biology, John Wiley & Sons, NY; and Kriegler, 1990, 
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Gene Transfer and Expression, a Laboratory Manual, 
Stockton Press, NY. 

Delivery of the syngene into a patient may 
be either direct, in which case the patient is 
5 directly exposed to the syngene or syngene -carrying 
vector, or indirect, in which case, cells are first 
transformed with the syngene in vitro, then 
transplanted into the patient . These two approaches 
are known, respectively, as in vivo or ex vivo gene 
10 therapy. 

In a specific embodiment, the syngene is 
directly administered in vivo for therapeutic effect, 
whereby it is expressed to produce the syngene 
product. This can be accomplished by any of numerous 

15 methods known in the art, e.g., by constructing it as 
part of an appropriate nucleic acid expression vector 
and administering it so that it becomes intracellular, 
e.g., by infection using a defective or attenuated 
retroviral or other viral vector (see U.S. Patent No. 

20 4,980,286), or by direct injection of naked DNA, or by 
use of microparticle bombardment (e.g., a gene gun; 
Biolistic, Dupont), or coating with lipids or cell- 
surface receptors or transfecting agents, 
encapsulation in liposomes, microparticles, or 

25 microcapsules, or by administering it in linkage to a 
peptide which is known to enter the nucleus, by 
administering it in linkage to a ligand subject to 
receptor-mediated endocytosis (see e.g., Wu and Wu, 
1987, J, Biol. Chem. 262:4429-4432) (which can be used 

30 to target cell types specifically expressing the 

receptors), etc. In another embodiment, a syngene- 
ligand complex can be formed in which the ligand 
comprises a fusogenic viral peptide to disrupt 
endosomes, allowing the syngene to avoid lysosomal 

35 degradation. In yet another embodiment, the syngene 
can be targeted in vivo for cell specific uptake and 
expression, by targeting a specific receptor (see, 
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e.g., PCT Publications WO 92/06180 dated April 16, 
1992 (Wu et al.); WO 92/22635 dated December 23 , 1992 
(Wilson et al.); WO92/20316 dated November 26, 1992 
(Findeis et al.); W093/14188 dated July 22, 1993 
5 (Clarke et al.), WO 93/20221 dated October 14, 1993 
(Young) ) . Alternatively, the syngene can be 
introduced intracellularly and incorporated within 
host ceil DNA for expression, by homologous 
recombination (Koller and Smithies, 1989, Proc. Natl. 

10 Acad. Sci. USA 86:8932-8935; Zijlstra et al., 1989, 
Nature 342:435-438). 

One common method of practicing gene therapy 
is by making use of retroviral vectors (see Miller et 
al., 1993, Meth. Enzymol . 217:581-599). A retroviral 

15 vector is a retrovirus that has been modified to 

incorporate a preselected gene in order to effect the 
expression of that gene. It has been found that many 
of the naturally occurring DNA sequences of 
retroviruses are dispensable in retroviral vectors. 

20 Only a small subset of the naturally occurring DNA 

sequences of retroviruses is necessary. In general, a 
retroviral vector must contain all of the cis-acting 
sequences necessary for the packaging and integration 
of the viral genome. These cis-acting sequences are: 

25 a) a long terminal repeat (LTR) , or portions 

thereof, at each end of the vector; 

b) primer binding sites for negative and positive 
strand DNA synthesis; and 

c) a packaging signal, necessary for the 
30 incorporation of genomic RNA into virions. 

The gene to be used in gene therapy is 
cloned into the vector, which facilitates delivery of 
the gene into a patient. 

More detail about retroviral vectors can be 
35 found in Boesen et al., 1994, Biotherapy 6:291-302, 
which describes the use of a retroviral vector to 
deliver the mdrl gene to hematopoietic stem cells in 
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order to make the stem cells more resistant to 
chemotherapy. Other references illustrating the use 
of retroviral Vectors in gene therapy are: Clowes et 
al., 1994, J. Clin. Invest. 93:644-651; Kiem et al., 
5 1994, Blood 83:1467-1473; Salmons and Gunzberg, 1993, 
Human Gene Therapy 4:129-141; and Grossman and Wilson, 
1993, Curr. Opin. in Genetics and Devel. 3:110-114. 

Adenoviruses are also of use in gene 
therapy. Adenoviruses are especially attractive 

10 vehicles for delivering genes to respiratory 

epithelia. Adenoviruses naturally infect respiratory 
epithelia where they cause a mild disease. Other 
targets for adenovirus -based delivery systems are 
liver, the central nervous system, endothelial cells, 

IS and muscle. Adenoviruses have the advantage of being 
capable of infecting non-dividing cells. Kozarsky and 
Wilson, 1993, Current Opinion in Genetics and 
Development 3:499-503 present a review of adenovirus- 
based gene therapy. Bout et al., 1994, Human Gene 

20 Therapy 5:3-10 demonstrated the use of adenovirus 

vectors to transfer genes to the respiratory epithelia 
of rhesus monkeys. Other instances of the use of 
adenoviruses in gene therapy can be found in Rosenfeld 
et al., 1991, Science 252:431-434; Rosenfeld et al., 

25 1992, Cell 6B:143-155; and Mastrangeli et al . , 1993, 
J. Clin. Invest. 91:225-234. 

It has been proposed that adeno-associated 
virus (AAV) be used in gene therapy (Walsh et al . , 
1993, Proc. Soc. Exp. Biol. Med. 204:289-300). 

30 Another approach to gene therapy involves 

transferring a gene to cells in tissue culture by such 
methods as electroporation, lipofection, calcium 
phosphate mediated transfection, or viral infection. 
Usually, the method of transfer includes the transfer 

35 of a selectable marker to the cells. The cells are 
then placed under selection to isolate those cells 



- 125 - 



WO 96/06188 



PCT/US95/10523 



that have taken up and are expressing the transferred 
gene. Those cells are then delivered to a patient. 

In this embodiment, the syngene is 
introduced into a cell prior to administration in vivo 
5 of the resulting recombinant cell. Such introduction 
can be carried out by any method known in the art, 
including but not limited to transf ection, 
eiectroporation, microinjection, infection with a 
viral or bacteriophage vector containing the syngene 

10 sequences, cell fusion, chromosome -mediated gene 
transfer, microcell -mediated gene transfer, 
spheroplast fusion, etc. Numerous techniques are 
known in the art for the introduction of foreign genes 
into cells (see e.g., Loef fler and Behr, 1993, Meth. 

15 Enzymol. 217:599-618; Cohen et al., 1993, Meth. 

Enzymol. 217:618-644; Cline, 1985, Pharmac. Ther. 
29:69-92) and may be used in accordance with the 
present invention, provided that the necessary 
developmental and physiological functions of the 

20 recipient cells are not disrupted. The technique 

should provide for the stable transfer of the syngene 
to the cell, so that the syngene is expressible by the 
cell and preferably heritable and expressible by its 
cell progeny. 

25 The resulting recombinant cells can be 

delivered to a patient by various methods known in the 
art. In a preferred embodiment, epithelial cells are 
injected, e.g., subcutaneously . In another 
embodiment, recombinant skin cells may be applied as a 

30 skin graft onto the patient. Recombinant blood cells 
(e.g., hematopoietic stem or progenitor cells) are 
preferably administered intravenously. The amount of 
cells envisioned for use depends on the desired 
effect, patient state, etc., and can be determined by 

35 one skilled in the art. 

Cells into which a syngene can be introduced 
for purposes of gene therapy encompass any desired, 
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10 



available cell type, and include but are not limited 
to epithelial cells, endothelial cells, keratinocytes, 
fibroblasts, muscle cells, hepatocytes; blood cells 
such as T lymphocytes, B lymphocytes, monocytes, 
macrophages, neutrophils, eosinophils, megakaryocytes, 
granulocytes; various stem or progenitor cells, in 
particular hematopoietic stem or progenitor cells. 
e.g., as obtained from bone marrow, umbilical cord 
blood, peripheral blood, fetal liver, etc. 

In a preferred embodiment, the cell used for 
gene therapy is autologous to the patient. 

In an embodiment in which recombinant cells 
are used in gene therapy, a syngene is introduced into 
the cells such that it is expressible by the cells or 
15 their progeny, and the recombinant cells are then 

administered in vivo for therapeutic effect; stem and 
progenitor cells are preferred for use. Any stem 
and/or progenitor cells which can be isolated and 
maintained in vitro can potentially be used in 
20 accordance with this embodiment of the present 
invention. Such stem cells include but are not 
limited to hematopoietic stem cells (HSC) , stem cells 
of epithelial tissues such as the skin and the lining 
of the gut, embryonic heart muscle cells, liver stem 
25 cells (PCT Publication WO 94/08598, dated April 28, 
1994), and neural stem cells (Stemple and Anderson, 
1992, Cell 71:973-985). 

Epithelial stem cells (ESCs) or 
keratinocytes can be obtained from tissues such as the 
30 skin and the lining of the gut by known procedures 
(Rheinwald, 1980, Meth. Cell Bio. 21A:229) . In 
stratified epithelial tissue such as the skin, renewal 
occurs by mitosis of stem cells within the germinal 
layer, the layer closest to the basal lamina. Stem 
35 cells within the lining of the gut provide for a rapid 
renewal rate of this tissue. ESCs or keratinocytes 
obtained from the skin or lining of the gut of a 
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patient or donor can be grown in tissue culture 
(Rheinwald, 1980, Meth. Cell Bio. 21A:229; Pittelkow 
and Scott; 1986, Mayo Clinic Proc. 61:771). If the 
ESCs are provided by a donor, a method for suppression 
5 of host versus graft reactivity (e.g., irradiation, 
drug or antibody administration to promote moderate 
immunosuppression) can also be used. 

With respect to hematopoietic stem cells 
(HSC) , any technique which provides for the isolation, 

10 propagation, and maintenance in vitro of HSC can be 

used in this embodiment of the invention. Techniques 
by which this may be accomplished include (a) the 
isolation and establishment of HSC cultures from bone 
marrow cells isolated from the future host, or a 

15 donor, or (b) the use of previously established long- 
term HSC cultures, which may be allogeneic or 
xenogeneic. Non- autologous HSC are used preferably in 
conjunction with a method of suppressing 
transplantation immune reactions of the future 

20 host/patient. In a particular embodiment of the 
present invention, human bone marrow cells can be 
obtained from the posterior iliac crest by needle 
aspiration (see, e.g., Kodo et al., 1984, J. Clin. 
Invest. 73:1377-1384). In a preferred embodiment of 

25 the present invention, the HSCs can be made highly 
enriched or in substantially pure form. This 
enrichment can be accomplished before, during, or 
after long-term culturing, and can be done by any 
techniques known in the art. Long-term cultures of 

30 bone marrow cells can be established and maintained by 
using, for example, modified Dexter cell culture 
techniques (Dexter et al., 1977, J. Cell Physiol. 
91:335) or Witlock-Witte culture techniques (Witlock 
and Witte, 1982, Proc. Natl. Acad. Sci. USA 

35 79:3608-3612). 

In a specific embodiment, the syngene to be 
introduced for purposes of gene therapy comprises an 
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inducible promoter operably linked to the coding 
region, such that expression of the syngene is 
controllable by controlling the presence or absence of 
the appropriate inducer of transcription. 

5 

5.9. PHARMACEUTICAL COMP OSITIONS 

The invention provides methods of treatment 
by administration to a subject of an effective amount 
of a pharmaceutical (therapeutic) composition 

10 comprising a syngene. Such a syngene envisioned for 
therapeutic use is referred to hereinafter as a 
" Therapeutic" or "Therapeutic of the invention. tt As 
used hereinbelow, "Therapeutic" or "Therapeutic of the 
invention" shall also be used to refer to molecules 

15 comprising the encoded peptide products of syngenes r 
e.g., where a molecule comprising the peptide binding 
domain encoded by a syngene is used therapeutically. 
In a preferred aspect, the Therapeutic is 
substantially purified. The subject is preferably an 

20 animal, including but not limited to animals such as 

cows, pigs, horses, chickens, cats, dogs, etc., and is 
preferably a mammal, and most preferably human. 

Formulations and methods of administration 
that can be employed when the Therapeutic comprises a 

25 syngene are described in Section 5.8.1; additional 

appropriate formulations and routes of administration 
can be selected from among those described 
hereinbelow. 

Various delivery systems are known and can 

30 be used to administer a Therapeutic of the invention, 
e.g., encapsulation in liposomes, microparticles, 
microcapsules, recombinant cells containing the 
Therapeutic, receptor-mediated endocytosis (see, e.g., 
, Wu and Wu, 1987, J. Biol. Chem. 262:4429-4432), 

35 construction of a Therapeutic syngene as part of a 
retroviral or other vector, etc. Methods of 
introduction include but are not limited to 
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intradermal , intramuscular, intraperitoneal , 
intravenous, subcutaneous, intranasal, epidural, and 
oral routes. The compounds may be administered by any 
convenient route, for example by infusion or bolus 
5 injection, by absorption through epithelial or 

mucocutaneous linings (e.g., oral mucosa, rectal and 
intestinal mucosa, etc.) and may be administered 
together with other biologically active agents. 
Administration can be systemic or local. In addition, 

10 it may be desirable to introduce the pharmaceutical 

compositions of the invention into the central nervous 
system by any suitable route, including 
intraventricular and intrathecal injection; 
intraventricular injection may be facilitated by an 

15 intraventricular catheter, for example, attached to a 
reservoir, such as an Ommaya reservoir. In a specific 
embodiment, it may be desirable to utilize liposomes 
targeted via antibodies to specific identifiable cell 
surface antigens. 

20 In a preferred aspect, the Therapeutic 

comprises a syngene that is part of an expression 
vector that expresses the syngene in a suitable host. 
In particular, such a syngene has a promoter operably 
linked to the syngene coding region, said promoter 

25 being inducible or constitutive. In another* 

particular embodiment, the syngene is a nucleic acid 
molecule in which the syngene coding sequences and amy 
other desired sequences are flanked by regions that 
promote homologous recombination at a desired site in 

30 the genome, thus providing for intrachromosomal 

expression of the syngene (Koller and Smithies, 1989, 
Proc. Natl. Acad. Sci. USA 86:8932-8935; Zijlstra et 
al., 1989, Nature 342:435-438). 

In a specific embodiment, it may be 

35 desirable to administer the Therapeutics of the 

invention locally to the area in need of treatment; 
this may be achieved by, for example, and not by way 
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of limitation, local infusion during surgery, topical 
application, e.g., in conjunction with a wouhd\ 
dressing after* surgery, by injection, by means of a 
catheter, by means of a suppository, or by means of an 
implant, said implant being of a porous, non-porous, 
or gelatinous material, including membranes, such as 
sialastic membranes, or fibers. 

The present, invention provides 
pharmaceutical compositions. Such compositions 
comprise a therapeutically effective amount of a 
Therapeutic, and a pharmaceutical ly acceptable carrier 
or excipient. Such a carrier includes but is not 
limited to saline, buffered saline, dextrose, water, 
glycerol, ethanol, and combinations thereof. The 
carrier and composition can be sterile. The 
formulation should suit the mode of administration. 

The composition, if desired, can also 
contain minor amounts of wetting or emulsifying 
agents, or pH buffering agents. The composition can 
be a liquid solution, suspension, emulsion, tablet, 
pill, capsule, sustained release formulation, or 
powder. The composition can be formulated as a 
suppository, with traditional binders and carriers 
such as triglycerides. Oral formulation can include 
standard carriers such as pharmaceutical grades of 
mannitol, lactose, starch, magnesium stearate, sodium 
saccharine, cellulose, magnesium carbonate, etc. 

In a preferred embodiment, the composition 
is formulated in accordance with routine procedures as 
a pharmaceutical composition adapted for intravenous 
administration to human beings. Typically, 
compositions for intravenous administration are 
solutions in sterile isotonic aqueous buffer. Where 
necessary, the composition may also include a 
solubilizing agent and a local anesthetic such as 
lignocaine to ease pain at the site of the injection. 
Generally, the ingredients are supplied either 
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separately or mixed together in unit dosage form, for 
example, as a dry lyophilized powder or water fr e 
concentrate in a hermetically sealed container such as 
an ampoule or sachette indicating the quantity of 
5 active agent. Where the composition is to be 

administered by infusion, it can be dispensed with an 
infusion bottle containing sterile pharmaceutical 
grade water or saline. Where the composition is 
administered by injection, an ampoule of sterile water 

10 for injection or saline can be provided so that the 
ingredients may be mixed prior to administration. 

The Therapeutics of the invention can be 
formulated as neutral or salt forms. Pharmaceutically 
acceptable salts include those formed with free amino 

15 groups such as those derived from hydrochloric, 

phosphoric, acetic, oxalic, tartaric acids, etc., and 
those formed with free carboxyl groups such as those 
derived from sodium, potassium, ammonium, calcium, 
ferric hydroxides, isopropyl amine, triethylamine, 2- 

20 ethylamino ethanol, histidine, procaine, etc. 

The amount of the Therapeutic of the 
invention which will be effective in the treatment of 
a particular disorder or condition will depend on the 
nature of the disorder or condition, and can be 

25 determined by standard clinical techniques. In 

addition, in vitro assays may optionally be employed 
to help identify optimal dosage ranges. The precise 
dose to be employed in the formulation will also 
depend on the route of administration, and the 

30 seriousness of the disease or disorder, and should be 
decided according to the judgment of the practitioner 
and each patient's circumstances. 

The invention also provides a pharmaceutical 
pack or kit comprising one or more containers filled 

35 with one or more of the ingredients of the 

* 

pharmaceutical compositions of the invention. 
Optionally associated with such container (s) can be a 
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notice in the form prescribed by a governmental agency 
regulating the manufacture, use or sale of 
pharmaceuticals or biological products, which notice 
reflects approval by the agency of manufacture, use or 
5 sale for human administration. 

6. EXAMPLES 
For the examples incorporated herein, a TSAR 
library is utilized; however, to those skilled in the 
10 art, it will be apparent that other random peptide 

display libraries may be used. An example of a TSAR 
library is the TSAR-9 library disclosed in Kay et al., 
1993, Gene 128:59-65. TSAR-9 constructs display a 
peptide of about 3 8 amino acids in length having 36 
15 totally random positions. 

6.1. ISOLATION OF HSDBs FOR NF-«B 
AND NF-IL6 BINDING SITES 

20 6.1.1. OLIGONUCL EOTIDE TARGETS 

The synthetic oligonucleotide targets 
(ligands) for isolating HSDBs that encode peptide 
sequences that bind to specific cis-elements for 
several genes of interest are described in the 

25 following paragraphs. 

For the NF-kB site of HLA class I genes the 
sequence is 5'gggtGGGGATTCCCCatct (SEQ ID NO: 135) 
(called H2*b-U) and 5 ' agatGGGGAATCCCCaccc (SEQ ID NO: 
136) (called H2«b-L) (the suffix U stands for upper 

3Q strand and a suffix L stands for lower strand) . The 
double stranded oligonucleotide formed by the 
hybridization of H2kB-U and H2kB-L is called the H2kB 
oligonucleotide. The upper case sequence is the known 
NF-kB binding site (see Baldwin and Sharp, 1988, Proc. 

35 Natl. Acad. Sci, USA 85:723-727). This particular 

sequence is homologous to the NF-*B regulatory domain 
of the murine homologue of the human HLA class I gene. 
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The murine sequence is from the H-2k (d) gene and is in 
the databases as GenBank accession X01815 . The upper 
case NF-kB site is completely conserved in human HLA 
class I genes such as HLA-B27 (GenBank M12967) and 
5 HLA- J (GenBank M80470) and HLA-A2 (GenBank K02883) . 

IL-6 is an attractive target for regulation 
by syngenes since it has been shown to be involved in 
immune responses and the acute phase protein responses 
(Kishimoto, 1989, Blood 74:1-10; Clark, 1989, Ann. 

10 N.Y. Acad. Sci. 557:438-443; Castell et al . , 1989, 

Ann. N.Y. Acad. Sci. 557:87-101). For the NF-xB site 
in the IL-6 gene, the following oligonucleotides are 
used: 5 ' atgtGGGATTTTCCcatg (SEQ ID NO: 137) (called 
IL6kB-U) and 5 ' catgGGAAAATCCCacat (SEQ ID NO: 138) 

15 (called IL6kB-L) . The double stranded oligonucleotide 
formed by the hybridization of IL6kB-U and IL6«B-L is 
called the IL6«B oligonucleotide. This site has been 
defined as the NF-kB regulatory sequence in the IL-6 
gene by Taiji et al., 1993, Proc. Natl. Acad. Sci. USA 

20 90:10193-10197. 

For the NF-kB site in the IL-8 gene, the 
following oligonucleotides are used: 

5 ' atcgTGGAATTTCCtctg (SEQ ID NO: 139) (called IL8«B-U) 
and 5 ' cagaGGAAATTCCAcgat (SEQ ID NO: 140) (called 

25 IL8«B-L) . The double stranded oligonucleotide formed 
by the hybridization of IL8*B-U and IL8«B-L is called 
the IL8«B oligonucleotide. This site has been defined 
as the NF-kB regulatory sequence in the IL-8 gene by 
Kunsch and Rosen, 1993, Mol. Cell. Biol. 13:6137-6146. 

30 For the NF-IL6 site found in the IL6 gene, 

the following oligonucleotides are synthesized: 
5 ' acgtcATTGCACAAtctt (SEQ ID NO: 141) (called NFIL6-U) 
and 5 ' aagaTTGTGCAATgacgt (SEQ ID NO: 142) (called 
NFIL6-L) . The double stranded oligonucleotide formed 

35 by the hybridization of NFIL6-U and NFIL6-L is called 
the NFIL6 oligonucleotide. Those nucleotides in upper 
case correspond to the consensus sequence for an NF- 
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IL6 site and have been shown to constitute a 
functional NF-IL6 regulatory cis-element (Zhang et al, 
1994, Proc. Natl. Acad. Sci. USA 91:2225-2229). 

For the experiments outlined below where it 
5 is desirable to immobilize the oligonucleotides to 
streptavidin coated plastic surfaces, all the upper 
strand oligonucleotides listed above are synthesized 
with a biotinylated thymidine nearest to the 3 ' 
thymidine position, but not at the 3 # end. This 

10 biotin moiety is separated from the thymidine base by 
a 20 carbon spacer arm. Therefore, when the 
respective upper and lower strand oligonucleotide 
pairs are annealed together, there is one biotin tag 
on each pair. By kinasing the 5* end of each 

15 synthetic oligonucleotide with T4 polynucleotide 

kinase as described in Ausubel et al . (eds.), 1993, 
Current Protocols in Molecular Biology, John Wiley and 
Sons, Inc., New York, NY, then annealing the 
respective pairs of oligonucleotides in 10 mM Tris, 

20 ItnM EDTA, pH 7.5 (TE buffer) with 50 mM NaCl at 30°C 
and then carrying out a ligation reaction (section 
3.14.2 in Current Protocols in Molecular Biology], one 
obtains DNA molecules with one or more transcription 
factor binding sites per molecule. Unless otherwise 

25 stated, all abbreviations and buffers are as described 
in the Appendix in Current Protocols in Molecular 
Biology. Thus, for the isolation of TSAR phage that 
bind to DNA, one can prepare either a monovalent 
target, solely by annealing the appropriate pairs of 

30 oligonucleotides or a multivalent target by the 

ligation of kinased monovalent target molecules. The 
ligated fragments of DNA can be analyzed by gel 
electrophoresis with visualization of ethidium bromide 
stained DNA using short wave UV light to determine the 

35 average size of a ligated fragment and thus, the 

average number of cis-elements per fragment. For the 
experiments carried out below, it is optimal to have 
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either one site per molecule (unligated fragments) or 
10 sites per molecule. Other numbers of sites are 
less desirable. Fragments of the appropriate size can 
be recovered by gel electrophoresis or velocity 
5 sedimentation centrif ugation techniques. 

Phage from a TSAR library that bind to 
specific DNA sequences are isolated by a process of 

10 screening called panning. In a preferred method, 

"Universal Covalent" multiwell plates obtained from 
Costar (Cambridge, MA), or similar multiwell plates, 
are utilized. The specific DNA target is immobilized 
directly to the wells in a procedure supplied by the 

15 manufacturer, Costar' s Universal Covalent surface is 
intended to covalently immobilize biomolecules via an 
abstractable hydrogen using UV illumination resulting 
in a carbon-carbon bond. Although the linkage is 
non-specific and does not allow for site-directed 

20 orientation of a biomolecule, this surface is very 

useful for the immobilization of double stranded DNA. 
A 10t) m1 DNA sample (10 iig/ml in 50 mM Na Acetate 
buffer) is added to each of the appropriate wells and 
incubated at room temperature for 1 hr. The solution 

25 is removed and the DNA covalently immobilized to the 

surface by exposing the plate to UV light for the time 
determined to be optimal in the UV Intensity 
Verification section of the instructions provided by 
Costar. Subsequently, each well is blocked with BSA 

30 as described below. The advantage to this method is 
that one obtains fewer TSAR binding phage targeted to 
a substance such as streptavidin which, in an 
alternative method, is used to attach the biotinylated 
DNA to the multi-well plate surface. However, a 

35 disadvantage is that the DNA is bound directly to the 
plate, providing some degree of steric interference 
with the binding of the desired TSAR phage to the DNA 
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target. This is less of a problem when using target 
DNA immobilized using streptavidin due to the 20 
carbon space linker between the biotin and the DNA. 

Another method by which panning can be 
5 carried out is as follows. Ali quote of phage are 

first placed into ELISA plate wells coated only with 
streptavidin.. This removes those phage capable of 
binding to streptavidin. Then the aliquots are 
transferred to wells coated with target DNA 

10 immobilized via a biotin linked to streptavidin bound 
to the surface of an ELISA plate well. Preferably, 
Reacti-Bind brand streptavidin coated strip plates, 
cat. # 15124 are used. These are obtainable from 
Pierce, 3747 North Meridian Road, Rockford, IL. 

15 Less preferably, one may coat Costar brand 

E. I. A. /R.I. A. multi-well plates (cat. #3590). All 
wells are coated with streptavidin using 1 ftg/ml in 
100 mM Na Bicarbonate, pH 8.4 buffer. 50 jil are added 
to each well and incubated at 4°C overnight or 22 °C 

20 for 4 hrs. The streptavidin solution is pipetted 

away, and the remaining binding surfaces are blocked 
with 300 ^1/well of a solution containing 5 mg/ml BSA 
in 100 mM Na Bicarbonate, pH 8.4 buffer for 1 hr at 
37°C or overnight at 4°C. 

25 Initially, the wells on a coated 96 -well 

multi-well plate are labelled in pairs. Two wells are 
used for each specific DNA target, one well for each 
target is labelled "A" and the other is labeled "B" . 
The wells are washed with a Wash Buffer composed of IX 

30 phosphate buffered saline (PBS) , 0.1% Tween * 20 and 
0.1% bovine serum albumin (BSA). The washing is 
repeated three times. 

The appropriate oligonucleotide fragments 
(specific target DNA) are added to wells labelled "B" 

35 using 50 /xl of a DNA solution {20 jig/ml) in Ix PBS 
buffer. The streptavidin is allowed to capture the 
DNA for 2 hrs at room temperature. The DNA solutions 
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are withdrawn with a pipette and discarded. All 
non- specif ically bound DNA is washed away using wash 
buffer described above. 

To begin the actual panning, 10 11 particles 
5 of phage from a TSAR-9 or TSAR-12 or other suitable 

library in 100 pi in IX PBS with 0.1% BSA are added to 
each well labelled "A" and allowed to incubated for 2 
hrs at room temperature. To begin the specific 
adsorption of phage that bind DNA, the adsorbed phage 

10 are transferred from wells labelled n A" to wells 

labelled "B". These are allowed to bind for 4 hrs at 
room temperature. The unbound phage are carefully 
pipetted away and each well is washed 4 times with 300 
jil wash buffer. Care is taken not to cross 

15 contaminate the wells. 

Elution of DNA binding phage is done by the 
addition of 60 fil 50 mM glycine-HCl pH 2.0 buffer for 
10 minutes at 65°C. The eluted phage are transferred 
to a 60 jil solution of 200 mM sodium phosphate, pH 7.4 

20 to neutralize. One round of panning is now complete. 

At this point it is preferred to amplify the 
phage isolated from DNA coated wells. This population 
of phage is not a pure population of phage that bind 
specific DNA target sequences, but rather a population 

25 that is substantially enriched for phage that bind to 
the target sequences. In the present example, four 
sublibraries are being isolated, one that binds to a 
NFIL6 site and three that have specificity toward 
different NF-*B sites. See Section 6.1.1 for a 

30 description of the oligonucleotide target ligands for 
these sublibraries. These sub-libraries are composed 
of "DNA binding phage" or DB phage. 

Each sub library of DB phage is amplified, 
titered and stored, e.g., as described in Kay et al., 

35 1993, Gene 128:59-65. To obtain phage that have the 
highest affinity for each of the DNA binding targets, 
three more rounds of panning are carried out in 
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substantially the same fashion as described above. 
However, since the population of DNA binding phage 
hav been significantly enriched there are many copies 
of each unique phage in each 10 A1 aliquot of the 
5 amplified sublibrary. Three rounds of additional 
panning toward each target is carried out without 
additional amplification of the phage in between 
adsorption and elution. After the final elution and 
neutralization, the phage are used to infect E. coli 
10 strain DH5a£" and plated out to obtain substantially 
pure plaques. 

About 24 individual plaques panned against 
each target sequence are picked and used to grow up a 
stock of 2 ml of phage. The TSAR insert in each phage 

15 is sequenced by the usual methods, e.g., Sanger 

dideoxy sequencing. Generally one obtains several 
identical or very similar phage in each group of 24 
from one panning experiment. The translation of the 
DNA sequence (syngene) encoding the TSAR domain within 

20 the phage is the peptide sequence responsible for the 
binding of a given phage to the DNA target sequence. 
One isolate for each unique peptide encoding sequence 
is utilized for further characterization of its DNA 
binding properties. 

25 

€-1-3. CHARACTERIZATION O F BINDING PHA(?p 

The selectivity of a given phage for a 
target sequence can be determined by any of several 
means. One method involves the carrying out of a 

30 "phage" ELISA utilizing an EL ISA kit from Pharmacia 

(Product #27-9402-01) composed in part of horseradish 
peroxidase conjugated sheep anti-M13 antibody. The 
phage ELISA is carried out as follows : 

1. The phage to be* tested is grown in 

35 appropriate cells for 6 hrs to overnight in 2XYT at 
37°C; - • 
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2. The cells are spun down and the 
supernatant is collected. This supernatant can be 
stored at*4°C for several days; 

3 . The wells of microtiter plates are 

5 coated with the oligonucleotides the phage is to be 

tested against. Coating can be done as in the panning 
methods described above; 

4. About 100 ill of blocking solution (1 
mg/ml BSA in 100 mM NaHC0 3 , pH 8.3) is added and the 

10 wells are incubated for 1 hr; 

5. The liquid is flicked out and the wells 
are washed in PBS with 0.1 % Tween • 20; 

6 . 50 fxl of phage supernatant is added to 

the wells; 

15 7. The wells are incubated at room 

temperature for 2 hrs; 

8. The wells are washed multiple times 
with PBS with 0.1 % Tween • 20. One wash should be 
for about 10 minutes; 

20 9. 50 |il of Pharmacia's horseradish 

peroxidase conjugated sheep anti-M13 IgG (diluted 
1:3000 in PBS with 0.1 % Tween • 20 and 1 mg/ml BSA) 
is added and incubated for l hr at room temperature; 

10. The wells are washed multiple times 
25 with PBS with 0.1 % Tween • 20; 

11. 100 fil of ABTS* (2,2'-Azino-di- [3- 
ethylbenzthiazdine sulfonate (6)] (0.2 mg/ml) is 
added; 

12. Color develops in 10 to 30 minutes and 
30 absorbance at 405 nm is read in a microtiter plate 

reader . 

Thus, by following the instructions provided 
by the manufacturer and utilizing ABTS* as substrate 
for the peroxidase, one can detect the amount of phage 
35 bound to a well in a multi-well plate in a 

quantitative manner. The binding selectivity for a 
given DNA target sequence of any unique TSAR phage can 
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be rapidly determined by bringing aliquots (10 8 phage 
particles) of a particular phage into contact with 
specific target DNAs immobilized in wells of a 
multi-well plate. By using each of the 4 pairs of DNA 
5 targets described above in Section 6.1.1, 

appropriately immobilized, one can rapidly determine 
which phage binds specifically to which DNA sequence. 
A phage that binds to only one target sequence is a 
highly specific DNA binding phage (HSDB phage) . A 

10 phage that binds to more than one target sequence is a 
cross-specific DNA binding phage. 

Further characterization of a given peptide 
displaying phage's ability to bind to a particular 
target sequence can be determined by carrying out 

15 so-called competition experiments routinely used by 
those skilled in immunoassays. In these assays an 
aliquot of phage is first brought into contact with 
diverse pairs of non-biotinylated oligonucleotides 
under conditions and for a time sufficient to allow 

20 any binding to occur. Subsequently each aliquot is 
added to a well of a multi-well plate coated with a 
specific DNA target sequence. If the unlabeled 
competitor DNA is unable to bind to the TSAR domain of 
the phage, then the phage may bind to the specific DNA 

25 target sequence immobilized in the well. The phage 

bound in the well can be detected and measured by any 
convenient technique, e.g., use of the horseradish 
peroxidase conjugated sheep anti-M13 IgG and ABTS® of 
the phage ELISA described above. If the unlabeled DNA 

30 fragment is able to bind to the TSAR domain in the 

phage, then that phage will be unable to bind to the 
target DNA immobilized in the well. By varying the 
concentration of unlabeled DNA in solution one can 
estimate the relative affinity of a given phage TSAR 

35 domain for a specific DNA target as well as make a 

determination about the actual specificity of a TSAR 
domain for a specific DNA site. 
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It will be apparent to one skilled in the 
art that other means of analyzing the HSDB phage 
exist. It* should be noted that the use of the binding 
competition experiments outlined earlier is very good 
for doing this as one can get good estimates of 
binding coefficients as well as dissociation constants 
of the HSDBs for targets and/or unrelated DNA 
sequences . 



10 6.1.4. PHAGE CONTAINING BINDING DOMAINS SPECIFIC 

FOR NF-xB AND NF-IL6 BINDING SITES 



HSDB phage that recognized the 
oligonucleotide formed by the sequences H2«b-U and 
H2«B-L (H2*B oligonucleotide, see Section 6.1.1 above) 
1S were isolated from an R26 peptide display library. 
(See Section 6.9.4 and Figure 9 regarding the R26 
library) . 

The R26 library was screened and panned as 
described above using "Universal Covalent" multiwell 

2q plates with the H2kB oligonucleotide. Four phage 

clones were isolated that exhibited specific binding 
to the H2kB oligonucleotide. To examine further the 
specificity of binding of these clones, they were 
tested in a phage ELISA (as described above) against 

25 the H2«B oligonucleotide and three other 

oligonucleotides. The other oligonucleotides that 
were tested against these four phage were: (l) the 
oligonucleotide formed by IL6KB-U and IL6KB-L <IL6kB 
oligonucleotide); (2) the oligonucleotide formed by 

30 IL8kB-U and IL8kB-L (ILBkB oligonucleotide); (3) the 
oligonucleotide formed by NFIL6-U and NFIL6-L (NFIL6 
oligonucleotide, see Section 6.1.1). 

Figure 4 shows the results of these ELISAs. 
Clones 1, 2, and 6 all showed strong binding to the 

35 H2jcB oligonucleotide as compared to their binding to 
background (the BSA coated plates) . Clone 5 showed 
less strong, but still about 2.5-fold greater, binding 
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to the H2kB oligonucleotide than to background. This 
binding on the part of clones l f 2, 5, and 6 can be 
contrasted with the binding of m663 and a randomly 
selected phage from the R26 library, both of which 
5 bound the BSA coated plates about as well as they 

bound the plates coated with the H2kB oligonucleotide. 

The exquisite specificity of binding of 
KSDBs is also evident in Figure 4. Clones 1, 2, 5, 
and 6 all bound the H2*B oligonucleotide far better 

10 than they bound any of the other three 

oligonucleotides. This is seen by the much higher 
ratios of binding to the oligonucleotide coated plates 
versus binding to the BSA coated plates when the 
oligonucleotide was the H2kB oligonucleotide rather 

15 than any of the other three oligonucleotides. 

The oligonucleotide sequences of the random 
inserts of clones 1, 2, 5, and 6 were determined. 
This allowed for the determination of the 
corresponding amino acid sequences for the peptides 

20 encoded by these inserts/ The following were the 
results obtained: 

Clone 1 - SWCTYSGYCRVSSAGTAQRTSVDRDGM (SEQ 
ID NO: 143) 

25 Clone 2 - RTGNEQ PPGS FGRAAGCFH PG CKYMKLN (SEQ 

ID NO: 144) 

Clone 5 - SDKYFHDIRKYHPAAATSYKTRPDMPST (SEQ 
ID NO: 145) 

Clone 6 - RTGNEQPPGSFGRAAGCFHPGCKYMKLN (SEQ 
30 ID NO: 144) 

Note that the sequences of clones 2 and € 
are identical. In the discussion that follows, clone 
1 is called H2xB-l and clone 2 is called H2*B-2. 
35 HSDB phage that recognized the 

oligonucleotide formed by the sequences NFIL6-U and 
NFIL6-L (NFIL6 oligonucleotide, see Section 6.1.1 
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above) were isolated from the DC43 peptide display 
library. (See Section 6.9.6 and Figure 13 regarding 
the DC43 library) . 

The DC43 library was screened and panned as 
5 described above using "Universal Covalent" multiwell 
plates with the NFIL6 oligonucleotide. A phage clone 
(NFIL6-1) was isolated that exhibited specific binding 
to the NFIL6 oligonucleotide. 

The oligonucleotide sequence of the random 
10 insert of NFIL6-1 was determined. This allowed for 
the determination of the corresponding amino acid 
sequence for the peptide encoded by this insert. The 
following was the result obtained: 

15 NFIL6-1 - REWGVPGAHNRIRDHCNGPRCHAIRTNASHTQHI 

SRPPD (SEQ ID NO: 146) 

The selectivity of binding of the phage 
H2xB-l, H2*B-2, and NFIL6-1 toward the three 

20 oligonucleotides H2xB, IL6*B, and NFIL6 was tested by 
phage ELISA. Phage ELISA were performed as -in Section 
6.1.3. Each phage was assayed for its ability to bind 
to each oligonucleotide following immobilization of 
that oligonucleotide in the well of a microtiter 

25 plate. This binding was compared to the binding of 
the phage to wells that had been coated with BSA 
(bovine serum albumin) . As a negative control, the 
binding of the parental phage m663 to each of the 
oligonucleotides was also assayed. The results are 

30 shown in Figure 14. 

Figure 14 shows that libraries of random 
peptides can be successfully screened to identify and 
isolate binding domains that specifically bind to 
specific DNA sequences. Phage NFIL6-1 binds well to 

35 the NFIL6 oligonucleotide with some binding to the 

other two oligonucleotides. Phage H2kB-2 binds only 
to the target H2*B oligonucleotide. Phage H2kB-1 
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shows much greater binding to the H2kB oligonucleotide 
than to the other oligonucleotides. 

The target specificity of H2kB-2 was 
investigated further by testing its ability to bind to 
5 DNA sequences that are variants of the H2*B 

oligonucleotide. New DNA oligonucleotides were 
synthesized; the sequences of these oligonucleotides 
deviated from the sequence of the H2«B oligonucleotide 
by only one or two (and in one instance, three) bases. 

10 Oligonucleotides for the upper and lower 

strands were synthesized. For each pair of upper and 
lower strands, 20 /xg of upper and lower 
oligonucleotides were combined in 100 pi of TE buffer 
supplemented with 200 mM NaCl. Annealing and filling- 

15 in was carried out at 65°C for 5 minutes, 42°C for 5 
minutes, and 37°C for 15 minutes. The resulting 
double stranded DNA fragments were diluted in PBS, pH 
5.0. 50 fil of the appropriate DNA fragments were 
added to Costar Universal Covalent microtiter plates 

20 as in Section €.1.2 and incubated overnight at 4°C. 
Phage ELISA assays were carried out three times for 
each target DNA using the H2kB-2 phage. The results 
are shown in Table 8. For each variant DNA target, 
the binding of H2*B-2 to that target was compared to 

25 the binding of H2xB-2 to the H2*B oligonucleotide 
(wild type (WT) ) and expressed as percent binding 
compared to the wild type oligonucleotide (% WT) . 

30 



35 
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TABLE 8 

Specificity of H2kB-2 Binding to DNA Targets 




30 m 

Inspection of the data of Table 8 reveals 
that a core group of "critical 11 residues (those whose 
mutation results in decreased binding) contributes 
significantly to the H2kB-2 phage's ability to bind 
DNA. For example, the. G at position 7, when mutated 
to T (with no other changes) , results in diminished 
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binding (32% of wild type) . It is clear, however, 
that the context of the critical residues is also very 
important. When the G to T mutation at position 7 is 
coupled to a G to T mutation at position 8 (see mutant 
5 M-4), binding is further decreased to 17% of wild 
type. Yet when the G to T mutation at position 8 
occurs alone, binding is increased to 132% of wild 
type (see mutant M-4b) . A "consensus" DNA fragment 
which has all the apparent critical residues, but is 
10 flanked with a random array of nucleotides, is 

virtually unable to bind the H2kB-2 phage clone. 

6.1.5. PRODUCTION AND ANALYSIS OF MUTAGENI ZED BINDING 
PHAGE 

15 For the present invention, the relative 

specificity of an HSDB for a specific DNA site is more 
important them the actual affinity of the domain for 
the DNA site. In gene therapy, the effect of low 
affinity can be compensated for by increased levels of 

20 gene expression within the target cell. However, it 
is very important that the specificity be hxgh in 
order to bring about the desired activation or 
inhibition of the correct gene and not others. 

If, after analysis of the DNA binding phages 

25 isolated by the protocol outlined above, it is 

desirable to have HSDBs with greater selectivity, then 
the amino acid sequence encoded by the TSAR domain of 
the phage having the best properties can be 
mutagenized by conventional means known to those 

3Q skilled in the art. In particular, saturation 
mutagenesis is carried out using synthetic 
oligonucleotides synthesized from "doped" nucleotide 
reservoirs. The doping is carried out such that the 
original peptide sequence is represented only once in 

3 g 10' unique clones of the mutagenized oligonucleotide. 
The assembled oligonucleotides are cloned into the 
parental TSAR vector. Preferably the vector is m663 

* 
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(Fowlkes et al. r 1992, BioTechniques 13:422-427). 
m663 is able to make blue plaques when grown in £. 
coli stain* JM101 or DHScrF' . A library of greater than 
10 6 is preferred; however a library of 10 5 is 
5 sufficient to isolate TSAR phage displaying peptide 
domains with increased selectivity for binding to the 
target DNA sequence. 

Once a DNA binding TSAR mutant library has 
been constructed and amplified, it can be panned with 

10 immobilized target DNA sequences in a manner analogous 
to that described for the isolation of the initial DNA 
binding phage. It is preferable, but not necessary, 
to selectively pan out those phage capable of binding 
to sequences related to the target DNA sequence. 

15 Preferably, the related sequence is that of another 
site subject to regulation by the same transcription 
factor as the site desired to be regulated by the 
syngene product. Therefore, it is preferable to use 
four wells for this set of panning and phage 

20 isolations. Well "A" is coated only with 

streptavidin; well "B n is coated with a DNA binding 
target that is totally unrelated to the relevant DNA 
target; well "C" is coated with a DNA binding target 
that is related to the relevant DNA target; and well 

25 n D" is coated with the relevant DNA target. For the 
actual panning, an aliquot of the mutant TSAR DNA 
binding phage (50 fil, 10 X1 phage/ml) is added first to 
well "A", incubated for 2 hrs, then transferred to 
well "B", incubated for 2 hrs, then transferred to 

30 well for 2 hrs and finally, transferred to well 
"D" for 2 hrs. Well "D" is washed 4 times with wash 
buffer and then eluted as described above. The 
neutralized phage are used to infect E. coli and 
- stocks of the more specific DNA target binding phage 

35 are prepared. From these stocks one repeats another 3 
rounds of panning without intervening phage 
amplification steps. The phage obtained by this 
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process are called "highly specific DNA binding phage" 
or HSDB phage. 

Another means for blocking the binding of 
phage capable of binding to sequences related, but not 
5 identical, to the target DNA sequence is to add 
soluble DNA oligonucleotide fragments (to a final 
concentration of 0.1 ng/ml) to aliquots of the mutant 
library before panning. The added DNA oligonucleotide 
fragments bear the sequences related (but not 

10 identical) to the target sequence. 

To characterize HSDB phage relative to 
parental and other DB phage isolated from a TSAR- 9 
library, one takes advantage of the fact that the 
TSAR- 9 library phage have no lacZ activity, whereas 

15 the mutant libraries used to clone the HSDB phage are 
from a vector that expresses lacZ activity in the 
appropriate E. coli cell lines. To determine if a 
mutant binds to the target DNA with more specificity 
than its parent DB phage, equal amounts of the two 

20 phage to be compared are mixed together and then 

applied to a well containing a target DNA sequence. In 
addition, an equal amount of the two phages are 
applied to a well containing another DNA sequence. 
After the appropriate washing and elution, the phage 

25 recovered are plaqued onto the appropriate E. coli in 
the presence of X-gal. Ratios of blue to white phage 
are calculated. Since the mutant phage can convert 
X-gal to a blue product, one can readily determine 
which mutant phage have improved DNA binding 

30 specificity relative to DB phage isolated from the 
TSAR-9 library. The mutants that bind less well to 
the irrelevant DNA target are selected for 
characterization for potential as therapeutic syngene 
targeting domains. 

35 To determine critical residues for the 

binding of H2«B-2 phage to its DNA target and to 
determine if .the affinity or specificity of binding of 
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this phage could be enhanced, a scheme was developed 
to sample a large number of phage variants based on 
the peptide encoded by H2*eB-2. 

Oligonucleotides O8909 and o8545 were 
5 synthesized and purified by reverse phase HPLC (see 
Figure 15) . Double stranded DNA fragments were made 
by annealing 10 M9 of each oligonucleotide in 200 ^1 
of IX Sequenase* buffer (United States Biochemical 
Corp., Cleveland, OH) with 10 mM DTT and 150 jiM of 

10 each dNTP. The fill-in reaction was initiated with 3 
\l\ of Sequenase 1 *, Version 2.0 (United States 
Biochemical Corp., Cleveland, OH) and carried out at 
37°C for 30 minutes. The filled-in, double stranded 
fragments were phenol /chloroform extracted and ethanol 

15 precipitated. The dried pellet was resuspended in 60 
of TE and digested with Xba I and Xho I (New 
England Biolabs, Beverly, MA) according to the 
supplier's recommendations. The appropriate sized DNA 
fragments were purified by polyacryl amide gel 

20 electrophoresis (PAGE) . The fragments were cloned 

into m663 in the usual manner. A library of about 1.5 
x 10 8 phage variants was obtained. This library is 
called the ME#1 library (See Figure 15) . The ME#1 
library was panned using the original H2kB 

25 oligonucleotide target as well as with the IL6xB, 

IL8kB, and NFIL6 oligonucleotides. A large number of 
phage were identified that bound specifically to the 
H2kB oligonucleotide; no phage were identified that 
bound to any of the other targets . 

30 Twenty-two clones from the ME#1 library that 

bound the H2kB oligonucleotide (binders) were selected 
for further analysis. Also, twenty-six clones were 
identified that did not have significant ability to 
bind the H2/cB oligonucleotide (non-binders) . In 

35 addition, random plaques from the ME#1 library were 

tested by phage ELISA for the ability to bind the H2kB 
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oligonucleotide; about one third demonstrated some 
binding ability. 



10 



15 



20 



25 



30 



35 



The amino acid sequences of the inserts of 
the binders and non-binders from the ME#l library were 
determined. The sequences were analyzed to determine 
the frequencies at which each amino acid appeared at 
each position for the binders and for the non-binders • 
The results are shown in Table 9. 



TABLE 9 
Molecular Evolution of H2kB-2: 
Frequency of Residues for Binding and Non-Binding Members 
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K 


25% 


73% 


N 


51% 


5% 


25% 


12% 


N 


51% 


35% 


I 26 








H 


2.4% 


18% 






H 


2.4% 


0% 


I 27 


L 


25% 


18% 


H 


2.4% 


73% 


25% 


62% 


H 


2.4% 


0% 


I 28 


N 


51% 


18% 


K 


25% 


73% 


51% 


50% 


K 


25% 


12% 



In Table 3, -Residue" indicates the residues 
in that position in the insert of the original H2*B-2 

10 clone from the amino terminal to the carboxy terminal 
position; "p° indicates the percentage of clones 
expected to have the original residue occurring at a 
given position based on the scheme utilized for 
synthesizing the oligonucleotides for the ME#1 

15 library; "Actual" indicates the frequency that a 

specific residue was observed to occur in a specific 
position. 

Table 9 shows that there was no apparent 
bias in the sequences for both the binders and non- 

2Q binders in the first 16 residues. However, at 

position 17, 100% of the binders have the original 
cysteine, but only 31% of the non-binders do. For - 
both groups, this is more than the expected frequency 
of 17%. It appears that this cysteine is critical for 

25 the binders, but that it is also selected for in the 
library as a whole. 

There are significant differences between 
the two groups for residues 18 through 28. While the 
original residue is clearly favored in the binding 

3 q class for residues 18-24 and 26, this is not the case 
for positions 25, 27, and 28. In position 25, the 
original methionine was observed at less than the 
expected frequency in both the binders and non-binders 
and isoleucine was observed at higher than expected 

35 frequency in both classes. 

Residues 27 and 28 are very informative. 
For the binders, histidine was observed at position 27 
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in 73% of the clones, but was never observed at that 
position in the non-binders. At position 28, lysine 
was observed in 73% of the binders but in only 12% of 
the non-binders. Furthermore, the original asparagine 
5 was expected at a frequency of 51%, but occurred in 
only 18% of the binders. 

Thus, while the H2kB-2 clone clearly binds 
well to the H2kB oligonucleotide target, it does not 
carry the optimal amino acid sequence to do so (see 
10 below and Figures ISA and 16B for a further discussion 
of this point) . Most important, it was learned that 
substituting a histidine for the lysine at position 26 
and a lysine for the asparagine at position 28 leads 
to avid binding of the clones to the target DNA. 
15 Phage stocks were prepared for the binders 

from the ME#1 library as well as for the original 
H2*B-2 clone and the stocks were titered by serial 
dilution. Subsequently, the appropriate dilutions of 
each phage were analyzed by phage ELISA for binding to 
20 the H2*B oligonucleotide. The results are shown in 
Figures 16A and 16B. Binding is expressed as signal 
strength (O.D.). Phage having higher relative avidity 
for the target (as compared to H2«B-2) continue to 
produce a detectable signal at low concentrations of 
25 phage and thus the curves for those phage are shifted 
to the left as compared to the curve for H2*B-2. 

It is readily evident from Figures 16A and 
16B that most binders from the ME#1 library have a 
greater avidity for the target H2kB oligonucleotide 
30 than does the original H2*B-2 clone. Of the 22 

binders analyzed, 5 are less avid binders than H2*B-2 
and many have comparable binding avidity; however, 
over one third show avidity of more than two orders of 
magnitude greater than that of H2*B-2. Table 10 shows 
35 the amino acid sequences of the binders tested and 
summarizes their binding avidity. 
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TABLE 10 



Binding Strength of Clones from the ME#1 Library 



10 



15 



20 



25 



STRONG BINDERS 

950406-10 (Cl-9) 

950406-8 

950406-11 

950406-1 

950406-2 

Cl-8 

Dl-11 

Dl-1 

F15-D1 

MEDIUM BINDERS 

F15-A9b 

F15-B2b 

F15-A4 

Cl-11 

Cl-6 

Dl-3 

H2kB-2 

WEAKER BINDERS 

Bl-12 . 

Dl-7 

F15-B3 

F15-A7 

Al-13 

F15-B7 



S IDKVQPPGTSGRKTGCFHPGCKY I KLK 
STGKEHPPGSFGRATGCFHPGCKYIKLK 
STGKEHSPGSL<jRAPGCWKFGCKYIKLK 
RTGNDQPPRSFGHATGCFHPGCKYIKNK 
STGKDHAPSSFGRAAGCFHPGCKYIKHK 
RTGEDHP PGS FGKAAGCFHPGCKY I KHK 
RTGNDHPSVSYGRASGCFHAGCKYI KHK 
RTGNENPPGSLGRGTGCFHAACKYI KHK 
RAGTDQPPGSFVRASGCFHPGCKYIKHK 



RTGVDQSPESFDRATGCFHPGCKYIKHN 
RTGNDDPTGALARNSGCYHPGCKYINHK 
RTGNDNPPGSSGPASGCFHPGCKYMKHK 
STGKEHPPGSFGRVSGCFHPGCKYMKHN 
RTGNDHPSGSFEHAAGCFHPGCKYMKHK 
RTGDDH PTGS I GRASGCFHPGCKY I KHN 

RTGNEQPPGSFGRAAGCFHPGCKYMKLN 



STGKEHPPGSIGRASGCFHPGCKYIRHT 
RTGNEHPPGSIGRASGCFHPGCNYIKHK 
RTGNEHPPGSLGRAPGCYHPGCKY I HHP 
RTGNYDPPGSFGPSTGCFHPGCKYIHHK 
STGNEQHPGSLRRASGCFHPGCKYIHHN 
STGNEHPPGSVGHAPGCFHPGCKYMHLK 



SEQ ID 

NO: 
147 
148 
149 
150 
151 
152 
153 
154 
155 



156 
157 
158 
159 
160 
161 



144 



162 
163 
164 
165 
166 
167 



These results demonstrate that the above 
strategy for evolving a peptide sequence isolated from 

30 a random peptide library towards one with greater 

binding avidity for its target was highly successful. 

A group of the binders from the ME#1 library 
was tested by phage ELISA for the ability to bind to 
the same variant H2kB oligonucleotide targets as were 

35 described in Table 8. H2xB-2 was similarly tested. 
In order to compare the binding of these clones, the 
O.D. for binding to the original H2kB oligonucleotide 
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target (WT) was normalized to 100%. Table 11 shows 
the results . 



10 



TABLE 11 

Binding of Clones from Molecular Evolution 
Library #1 to H2kB Variant Targets 



(Clones |H2kB- 2 


I 950406-10 


F15-C1 


| F15-B3 


F15-A7 


F15-B2 


O.D. 


0.527 


1.021 


0.815 


0.117 


0.257 


0.480 


Iwt 


100 


100 


100 


100 


100 


100 


M-l 


46 


82 


67 


82 


95 


631 


M-2 


75 


76 


69 


85 


86 


57 


M-3 


106 


76 


93 


111 


86 


143 


Sm-4 


16 


24 


10 


47 


46 


16 


HM-4a 


19 


30 


17 


55 


54 


12 


M-4b 


94 


52? 


59 


64 


71 


103 


M-5 


184 


104 


146 


366 


258 


198 


M-5a 


175 


116 


154 


276 


249 


206 


M-5b 


53 


66 


60 


67 


73 


47 


M-4/5+ 


195 


107 


133 


503 


286 


209 


M-4/5- 


25 


57 


33 


61 


73 


27 


M-6 


74 


79 


79 


71 


44 


56 


M-6a 


161 


115 


129 


235 


179 


163 


M-6b 


155 


103 


119 


194 


169 


182 


M-7 


102 


76 


92 


77 


79 


70 


|M-8 - 


126 


83 


105 


122 


150 


162 II 


M-9 


94 


89 


108 


80 


105 


75 


cone. 


11 


2 


4 


46 


37 


el 


BSA 


6 


3 


5 


29 


9 


6 



15 



20 



25 



All of the binders that had previously been 
shown to have increased avidity for the H2kB 
oligonucleotide (as compared to H2kB-2) maintained the 
same relative target specificity as H2*B-2. However, 
some binders with less avidity for the H2kB 
oligonucleotide had increased avidity for certain of 
the variant H2/cB oligonucleotide targets. 

Since the critical residues identified in 
the first round of molecular evolution were at the 
carboxy terminal end of the expressed peptide insert, 
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it was not possible to rule out some contribution from 
residues in the phage pill protein to the DNA binding 
activity. A scheme was developed to construct 
additional molecular evolution libraries for panning 
5 with the H2kB oligonucleotide. This was done to 

ensure that the critical DNA binding residues would be 
able to function in the context of other gene 
sequences, e.g., nuclear localization sequences, 
transcriptional activation sequences, for constructing 

10 a functional syngene directed to the regulation of the 
activity of a gene with H2kB sequences. 

Two libraries were constructed. Molecular 
evolution library #2a (the ME#2a library) contained 
the critical residues as a fixed core flanked by 10 

15 random residues on each side. Molecular evolution 

library #2b (the ME#2b library) was constructed in a 
similar manner, but with a smaller group of core 
residues. Specifically, the ME#2b library lacks the 
initial arginine, serine, glycine, and arginine found 

20 within the core of the ME#2a library. In addition, 
the core of the M£#2b library was flanked with 12 
random residues on each side. Figures 17A and 17B 
summarize the construction of the ME#2a and ME#2b 
libraries. 

25 The ME#2a and ME#2b libraries were panned 

using the H2*B oligonucleotide as ligand. A number of 
clones were isolated as binders from the M£#2a and 
ME#2b libraries. Phage stocks were prepared for the 
binders from the ME#2a and M£#2b libraries as well as 

30 for the original H2«B-2 clone and the stocks were 
titered by serial dilution. Subsequently, the 
appropriate dilutions of each phage were analyzed by 
phage ELISA for binding to the H2*B oligonucleotide. 
The results are shown in Figure 18. Binding is 

35 expressed as signal strength (O.D.). Phage having 

higher relative avidity for the target (as compared to 

» 

H2kB-2) continue to produce a detectable signal at low 
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concentrations of phage and thus the curves for those 
phage are shifted to the left as compared to the curve 
for H2KB-2. 

While many clones from the ME#2a and ME#2b 
5 libraries did not have enhanced binding activity as 
compared to H2kB-2, it is apparent that a significant 
number have an apparent avidity two to three orders of 
magnitude greater than that of H2«B-2. However, none 
of the clones from the ME#2a or ME#2b libraries had an 
10 avidity as great as that of clone 959496-10 from the 
ME#i library. 

Table 12 shows the amino acid sequences of 
the inserts of the binders from the ME#2a and ME#2b 
libraries as well as the inserts of clone 959496-10 
15 and H2kB-2. 



TABLE 12 



J5 - 1A7 SADLVAPSGENRSGRGCFHPGCKYIKHKPTTPPAPSSA 

20 J5 " 2A7 SAATADLGVGERSGRGCFHPGCKYIKHKGSTSQQSQPL 16 9 

J5-1A10 S PDTKSPGGDVRSGRGf 'FHPGCKYTKHKSDSSHXEATT 170 

J5 - 2 A4 SVTSDDPQGRERSGRGCFHPGCKYIKHKNHATSESPNL 

J5-2A& - SGREPEGPSEQRSGRGCFHPGCKYIKHKTOSTMHTTPA 

J5-2B10 SFRALDEPTATRSGCFHPGCKYIKHKSPKDSFPNTTAA 

JS-1B2 SGTVPTEAVNNRPGCTHPGCKY'IKHKLPROPSPTPPLA 

J5 - 1B6 SDWSERYGGERGGGCFHPGCKyiKHKNKNSGTPLPSDP 

J5-1B1 SNTDT AWACS NGDRGCFHPGCXYIKHKTPP ^ I TJK TVLT P 176 

25 J5-2B3 SGNVDNGTERGGKGCFHPGCKYIKHKRPNAYEVVPPLD 177 

J5-2B2 SERAERGEGNWSAGCFHPGCKYIKHKSPRGANRSLVGA 17B 

J5-2B4 SSDGDTPGGSRKGGCFHPGCKYIKHKPSPQLPREGTHN 179 

950406-10 SIDKVQPPGTSGRKTGCPHPOCKYIKLK 147 

H2kB-2 RTGNEQPPGS FGRAAGCFHPGCKYMKLN 144 



SEQ ID 

NO. 
16 B 



171 
172 
173 
174 
175 



30 xt can be seen from Tadble 12 that certain 

residues are highly favored in the sequences flanking 
the critical core residues. This is particularly 
apparent in the clones from the ME#2b library which 
lacked the initial arginine, serine, glycine, and 

35 arginine residues found in the core of the ME#2a 

library. In all clones from the ME#2b library that 
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bound well to the target DNA there are charged 
residues in the four residues upstream of the core, 

6.2. ISOLATION OF HSDBs FOR GATA 
5 TRANSCRIPTION FACTOR BINDING SITES 

The methods described in Section 6.1 and its 
subsections above for the isolation of HSDBs for NF-kB 
and NF-IL6 binding sites can be easily modified to 
isolate HSDBs for GATA transcription factor binding 
^ 0 sites. 

The major modification consists in the use 
of DNA target sequences from GATA transcription factor 
binding sites rather than DNA target sequences from 
NF-icB or NF-IL6 binding sites. For example, for the 

15 GATA- 2 site in the human preproendothelin-1 gene, the 
target DNA sequence is 5' ctggccTTATCTccggct (SEQ ID 
NO: 180) for the upper strand and 5 • agccggAGATAAggccag 
(SEQ ID NO: 1B1) for the lower strand (Dorfman et al., 
1992, J. Biol. Chem. 267:1279-1285). 

2 0 The human gastric (H* + K*) ATPase gene is 

responsible for maintaining the large (about 5 units) 
pH difference between the cytoplasm and gastric lumen 
in the stomach. It has been proposed that regulation 
of the human gastric (H* + K*) ATPase gene would be 

2£ useful in the treatment of gastric ulcers (Maeda, 

1994, J. Biochem. 115:6-14). It would be useful to 
have syngenes that encode HSDBs that could be used to 
modulate the activity of the human gastric (H* + K*) 
ATPase gene. Such HSDBs can be obtained by screening 

30 random peptide libraries with the an oligonucleotide 
formed from an upper strand of 5 ' gacatGGGGGGATCTGGgca 
(SEQ ID NO: 182) and a lower strand of 
5 ' tgcCCAGATCCCCCCatgtc (SEQ ID NO: 183). This 
oligonucleotide represents a sequence similar to that 

35 of bindings site for the GATA-GT family of 

transcription factors (Maeda et al., 1990, J. Biol. 
Chem. 265:9027-9032) . 
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For the GATA-3 site in the human T cell 
receptor 6 gene enhancer, the target DNA sequence is 
5 ' gacactTGATAAcagaaa (SEQ ID NO: 184) for the upper 
strand and 5' tttctgTTATCAagtgtc (SEQ ID NO: 185) for 
5 the lower strand (Ko et al., 1991, Mol. Cell. Biol. 
11:2778-2784) . 

As for NF-kB and NF-IL6 transcription factor 
binding site KSDBs, the DNA target sequences for the 
isolation of GATA transcription factor binding site 
10 HSDBs can be monovalent or multivalent. Panning, 

amplification, and analysis of the isolated phage are 
done as for NF-«B or NF-IL6. Mutagenesis to produce 
HSDBs of greater selectivity can also be done as for 
NF-kB or NF-IL6 . 

15 

6.3. ISOLATION OF HSDBs FOR AP-1 

TRANSCRIPTION FACTOR BINDTKTfl fiTTE C 

The transcription factor AP-l, which 
consists of a heterodimer of the products of the c-fos 

20 and c-jun proto-oncogenes , is involved in the 

transcriptional regulation of a number of genes. For 
example, an AP-l binding site is a component of the 
TPA (12-0-tetradeconyl-phorbol-l3-acetate) -responsive 
enhancer element that is involved in conferring serum- 

25 responsive transcriptional activity to many genes 

(Bohmann et al., 1987, Science 238:1386-1392; Angel et 
al., 1988, Nature 332:166-171). it would be of great 
value to have syngenes that encoded products that 
could specifically bind to AP-l sites, since this 

30 would allow for the regulation of many genes that are 
involved in the growth response of cells. 

The methods described in Section 6.1 and its 
subsections above for the isolation of HSDBs for NF-xB 
and NF-IL6 binding sites can be easily modified to 

35 isolate HSDBs for AP-l transcription factor binding 
sites. 
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The major modification consists in the use 
of DNA target sequences from AP-l transcription factor 
binding sites rather than DNA target sequences from 
NF-*B or NF-IL6 binding sites. In general, the target 
S oligonucleotides will have the following sequences: 
5 'xxxxxTGA(G/C)T(C/A)Axxxxx (SEQ ID NO: 186) for the 
upper strand and 5 'xxxxxT(G/T) AC/G)TCAxxxxx (SEQ ID 
NO: 187) for the lower strand. See Gambari and 
Nastruzzi, 1994, Biochemical Pharmacology 4:599-610. 

10 Where two nucleotides are shown in parenthesis, the 
choice of which nucleotide to use will depend upon 
which specific AP-l binding site it is desired to 
isolate an HSDB for. Similarly, the choice of which 
nucleotide to use in the positions marked by an x will 

15 depend upon the nucleotide at those positions in the 

specific AP-l binding site it is desired to isolate an 
HSDB for. 

As for NF-icB and NF-IL6 transcription factor 
binding site HSDBs, the DNA target sequences for the 

20 isolation of AP-l transcription factor binding site 
HSDBs can be monovalent or multivalent. Panning, 
amplification, and analysis of the isolated phage are 
done as shown above for NF-xB or NF-IL6. Mutagenesis 
to produce HSDBs of greater selectivity can also be 

25 done as for NF-kB or NF-IL6. 

6.4. ISOLATION OF HSDBs FOR ATF 

TRANSCRIPTI ON FACTOR BINDING SITES 

The methods described in Section 6.1 and its 
subsections above for the isolation of HSDBs for NF-«B 

30 

and NF-IL6 binding sites can be easily modified to 
isolate HSDBs for ATF transcription factor binding 



35 



The major modification consists in the use of DNA 
target sequences from ATF transcription factor binding 
sites rather than DNA target sequences from NF-*B or 
NF-IL6 binding sites. In general, the target 
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oligonucleotides will have the following sequences: 
5 r XXXXXTGACG(C/T) (C/A) (G/A)xxxxx (SEQ ID NO: 188) for 

the upp r strand and 5'xxxxxT(G/T) AC/G) TCAxxxxx (SEQ 
ID NO: 189) for the lower strand. See Gambari and 
5 Nastruzzi, 1994, Biochemical Pharmacology 4:599-610. 
Where two nucleotides are shown in parenthesis, the 
choice of whiqh nucleotide to use will depend upon 
which specific ATF binding site it is desired to 
isolate an HSDB for. Similarly, the choice of which 
10 nucleotide to use in the positions marked by an x will 
depend upon the nucleotide at those positions in the 
specific ATF binding site it is desired to isolate an 
HSDB for. 

As for NF-xB and NF-IL6 transcription factor 
15 binding site HSDBs , the DNA target sequences for the 
isolation of ATF transcription factor binding site 
HSDBs can be monovalent or multivalent. Panning, 
amplification, and analysis of the isolated phage are 
done as shown above for NF-kB or NF-IL6. Mutagenesis 
20 to produce HSDBs of greater selectivity can also be 
done as for NF-xB or NF-IL6. 



6.5. ISOLATION OF HSDBs FOR ETS 

TRANSCRIPTION FACTOR BIND ING SITES 

The methods described in Section 6.1 and its 
subsections above for the isolation of HSDBs for NF-kB 
and NF-IL6 binding sites can be easily modified to 
isolate HSDBs for binding sites for members of the Ets 
family of transcription factors. 

The major modification consists in the use of DNA 
target sequences from specific Ets transcription 
factor binding sites rather than DNA target sequences 
from NF-xB or NF-IL6 binding sites. In general, the 
target oligonucleotides will have sequences determined 
by the specific nucleotide sequence of the specific 
Ets transcription factor binding site that it is 
desired to regulate. 
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For example, if it is desired to isolate an 
HSDB specific for the stromelysin gene, the target 
oligonucleotide will have an upper strand of 
5'GCAGGAAGCA (SEQ ID NO: 190) and the lower strand 
5 will be 5'TGCTTCCTGC {SEQ ID NO: 191). This 

oligonucleotide corresponds to an Ets binding site in 
the stromelysin promoter (Waslyk et al,, 1993, Eur. J. 
Biochem. 211:7-18) . 

If it is desired to isolate an HSDB specific 

10 for the T cell receptor of gene, the target 

oligonucleotide will have an upper strand of 
5 ' AGAGGATGTG (SEQ ID NO: 192) and the lower strand 
will be S'CACATCCTCTtSEQ ID NO: 193). This 
oligonucleotide corresponds to an Ets binding site in 

15 the T cell receptor a gene promoter (Waslyk et al., 
1993, Eur. J. Biochem. 211:7-18). 

If it is desired to isolate an HSDB specific 
for the c-fos gene, the target oligonucleotide will 
have an upper strand of 5'ACAGGATGTC (SEQ ID NO: 194) 

20 and the lower strand will be 5 ' GACATCCTGT (SEQ ID NO: 
195) . This oligonucleotide corresponds to an Ets 
binding site in the serum response element of the c- 
fos gene promoter (Waslyk et al., 1993, Eur. J. 
Biochem. 211:7-18) . 

25 As for NF-kB and NF-IL6 transcription factor 

binding site HSDBs, the DNA target sequences for the 
isolation of Ets transcription factor binding site 
HSDBs can be monovalent or multivalent. Panning, 
amplification, and analysis of the isolated phage are 

30 done as shown above for NF-kB or NF-IL6. Mutagenesis 
to produce HSDBs of greater selectivity can also be 
done as for NF-xB or NF-IL6. 



35 
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10 



15 



20 



25 



6.6. ISOLATION OF BINDING DOMAINS 
SPECIFIC FOR LUMENAL PROTEINS 
OF THE ENDOPIASTTf? PPTjrrT| f T7M 

To isolate binding domains specific for 
lumenal proteins of the endoplasmic reticulum, use is 
made of the conserved tetrapeptide KDEL (SEQ ID NO: 
116), which is found at the carboxy terminus of such 
proteins. Libraries such as, for example, the TSAR 
libraries described herein, are screened with a 
synthetic peptide ligand of the following sequence: 
KDELXXXXXX (SEQ ID NO: 196), where the identity of the 
positions marked with an X are determined by the 
specific amino acids at those positions in the 
specific lumenal proteins for which it is desired to 
isolate a binding domain. 

The synthetic peptides can be synthesized by 
any of several well known methods in the art. 
Panning, amplification, and analysis of the isolated 
phage are done in a manner that is broadly similar to 
the manner in Sections 6.1.2 and 6.1.3, with the 
difference that the ligand used is a peptide rather 
than an oligonucleotide. Methods of screening are 
disclosed in Sections 5.2 and 6.1.2. Alternatively, 
any well known method of screening a peptide library 
using a peptide as a ligand may be used. 



6.7. ISOLATION OF BINDING DOMAINS 

SPECIFIC FOR INTEGRAL MEMBRANE 
PROTEINS OF THE TRANS GOL QI NETWORK 

To isolate binding domains specific for 
30 integral membrane proteins of the trans Golgi network, 
use is made of the tetrapeptide YQRL (SEQ ID NO: 117), 
which has been found to be necessary and sufficient to 
for the targeting of membrane proteins to the trans 
Golgi network (Bos et al. # 1993, EMBO J. 12:2219-2228; 
35 Humphrey et al. f 1993, J. Cell Biol. 120:1123-1135). 
Libraries such as, for example, the TSAR libraries 
described herein, are screened with a synthetic 
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peptide ligand of the following sequence: XXXYQRLXXX 
(SEQ ID NO: 197) , where the identity of the positions 
marked with an X ar determined by the specific amino 
acids at those positions in the specific proteins for 
S which it is desired to isolate a binding domain. 

The synthetic peptides can be synthesized by 
any of several well known methods in the art. 
Panning, amplification, and analysis of the isolated 
phage are done in a manner that is broadly similar to 

10 the manner in Sections 6.1.2 and 6.1.3, with the 

difference that the ligand used is a peptide rather 
than an oligonucleotide. Methods of screening are 
disclosed in Sections 5.2 and 6.1.2. Alternatively, 
any well known method of screening a peptide library 

15 using a peptide as a ligand may be used. 



6.8. BIOLOGICAL ACTIVITY OF SYNGENE- ENCODED 
PRODUCTS THAT BIND TO THE NF-IL6 AND 
NF-kB SITES 



20 



25 



30 



In order to demonstrate the biological 
activity and utility of the HSDB-containing syngenes, 
one can exploit the observation of Stein and' Baldwin, 
1993, Mol. Cell. Bio. 13:7191-7198 that there is a 
functional and physical association of the NF-xB 
family members with NF-IL6 (also known as C/EBPjS) 
family members. The interaction of NF-kB with NF-IL6 
is such that, when both transcription factors are 
expressed, promoters with only NF-icB binding sites are 
inhibited but promoters with both NF-«cB and NF-IL6 
binding sites are synergist ically stimulated. NF-kB 
and NF-IL6 are activated by important inflammatory 
cytokines such as interleukin-1 and.IL-6. Therefore, 
the interactions' between NF-kB and NF-IL6 are likely 
to be involved in T-cell activation and the 
acute-phase response. Several genes having promoters 
that have closely spaced NF-kB and NF-IL6 binding 
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sites have been cloned. Examples are the IL-6 and the 
IL-8 genes. 

Genes such as IL-6 and IL-8 that have both 
NF-IL6 and NF-kB sites are transcriptionally activated 
5 by the induction and expression of those transcription 
factors together. In addition, genes such as the MHC 
class I genes (for example H-2Kb) that have only an 
NF-kB site (and no NF-IL6 site) are transcriptionally 
•antagonized by the expression of NF-IL6 and NF-kB 

10 together. They are, however, transcriptionally 

activated by the induction and expression of NF-kB 
alone. These observations allow one to readily 
determine the action of the syngenes that bind to the 
NF-IL6 and NF-kB sites when the appropriate syngenes 

15 are expressed in tissue culture cells along with 
appropriate reporter gene constructs. 

The basic assay involves the introduction of 
reporter constructs, syngene constructs and plasraids 
expressing one or more transcriptional factors. 

20 Reporter constructs: 1) Three copies of the 

H2jcB binding site fused upstream of a minimal fos 
promoter linked to the chloramphenicol 
acetyltransferase (CAT) gene (called pMHC-CAT herein, 
see Scheinman et al., 1993, Mol. Cell. Biol. 13:6089- 

25 6101)); 2) Plasmid pNFIL-6-CAT, which has a single 

copy of an oligonucleotide with a NFIL-6 site derived 
from the c-fos serum response element (5'- 
agcttgATTAGGACATcg3 ' (SEQ ID NO: 198) (binding site 
printed in upper case) in a Hindlll-BamHI-cut pTATA- 

30 CAT (referred to as C/EBPbeta -TATA- chloramphenicol 
ace tyl trans f erase reporter plasmid by Stein et al., 
1993, Mol. Cell. Biol. 13:3964-3974); 3) Plasmid pIL- 
8 -CAT, a IL-8 wild- type reporter constuct generated by 
cloning a single copy of an oligonucleotide 

35 encompassing the bp -97 to -69 region of the human IL- 
8 gene (5 ' -agcttcatCAGTTGCAAATCGTGGAATTTCCTctg-3 ' (SEQ 
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ID NO: 199) (binding sites for NF-IL6 and NFkB are in 
boldface type) into Hindlll-BamHI-cut TATA-CAT. 

All transcription factor expression 
constructs used are described in Stein et al. (1993, 
5 Mol. Cell. Biol. 13:3964-3974). They are : pCMV4T-p65, 
which contains the cDNAs encoding human NF-kB p65; and 
plasmid pCMV4T-rC/EBPbeta, which contains a cDNA 
encoding the rat C/EBPbeta (also known as NF-IL6) . 

Transfection of cells and analysis of CAT 

10 activity are done as follows. All cell lines are 

cultured in Iscove's Dulbecco modified Eagle medium 
supplemented with 7.5% fetal calf serum and 
antibiotics. Mouse F9 embryonal carcinoma cells are 
transiently transfected by the calcium phospate method 

15 (Graham et al . , 1973, Virology 52:456-467) (also see 

Current Protocols in Molecular Biology, section 9.1.1, 
Supplement 14, 1990). Monkey COS cells are 
transiently transfected with plasmid DNA by the DEAE- 
dextran method (Kawai et al., 1964, Mol. Cell. Biol. 

20 4:1172-1174) (see also Current Protocols, section 
9.2.1, Supplement 14 , 1987 ) . Chlorampenicol - 
acetyltransf erase (CAT) enzymatic activity was assayed 
as previously described using the Phase-Extraction 
Assay (see Current Protocols in Molecular Biology, 

25 Supplement 14, section 9.6.6), less preferably a 

chromatographic assay for CAT may be used (see Current 
Protocols in Molecular Biology, Supplement 14, section 
9.6.3) . 

The following Tables 13 and 14 outline the 
30 different combinations of experiments that are used to 
verify the biological effect of each of the types of 
syngenes that are the subject of this specific 
embodiment of the invention. When examining these 
tablas, it should be kept in mind that p65 is NF-jcB 
35 and C/EBP is an NF-IL6. 
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Table 13 



10 



15 



20 



25 



30 



Syngene 
Construct 
Used 


Promoter - 

Transcription 
Factor Expressed 


Promoter - 
Reporter 
Construct 
Used 


Expected | 
CAT I 
Activity I 


None 


pCMV4T-p65 


pMHC-CAT 


High | 


* 

■ 


pCMV4T-rC/EBPbeta 


m 


Low | 


• 


both *, see note below 




Low 


HSDB-NFkB-MHC 


pCMV4T-p65 


• 


Low 


ti 


pCMV4T-rC/EBPbeta 




Low 


1 M 


both 


- 


Low 1 


H HSDB-NFkB-IL-8 


pCMV4T-p65 


- 


High 


H ** 


pCMV4T-rC/EBPbeta 




Low 


■ * 


both 




Low 


HSDB-NF-DL-6 


pCMV4T-p6S 


* 


High 


m 


pCMV4T-rC/EBPbeta 


- 


Low 


1 ** 


both 


■ 


Low 


| None 


pCMV4T-p65 


pNF-IL-8-CAT ^ 


Med 


R * 


pCMV4T-rC/EBPbeta 


- 


Med 


i 


both 


- 


High 1 


HSDB-NFkB-MHC 


pCMV4T-p65 


• 


Low 1 


m 


pCMV4T-rC/EBPbeta 


- 


Med 1 


m 

■ 


both 


- 


Med 1 


HSDB-NF«B-IL-8 


pCMV4T-p6S 




Low H 


* 


pCMV4T-rC/EBPbeta 




Med 


ft 


both 




Med 


I HSDB-NFkB-IL-6 


pCMV4T-p65 


■ 


Med 


1 • 


pCMV4T-rC/EBPbeta 




Low 


1 * 


■ A * 




Med | 



35 



** "both" refers to PMA treatment of cells, which leads to the activation of 

endogenous NF-*B and C/EBP and thus to the same results as co-transfection with 
both of these transcription factors. 
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Table 14 



5 



10 



20 



Syngene 
Construct 
8 Used 


Promoter - 

Transcription 
Factor 


Promoter • 
Reporter 
Construct 
Used 


Expected 

CAT 
Activity 


| None 


pCMV4T-p65 


pNF-IL-6-CAT 


Med | 


m 


pCMV4T-rC/EBPbeta 


» 


Med 


m 


both 


- 


High 


HSDB-NFxB-MHC 


pCMV4T-p65 




Med 


1 ** 


pCMV4T-rC/EBPbeta 


■ 


Med 


1 ** 


both 




High 


HSDB-NFkB-IL-8 


pCMV4T-p65 




Med 




pCMV4T-rC/EBFbeta 




Med 


I * 


both 




High 1 


I HSDB-NF-IL-6 


pCMV4T-p65 




Med 1 


1 * 


pCMV4T-rC/EBPbcta 




Low | 


1 m 


both 




Med | 



It will be readily apparent from the results 
25 indicated in Tables 13 and 14 that the syngenes that bind 
to NF-IL6 and NF-kB sites are expected to selectively 
block the effect of the transcription factors NF-IL6 and 
NF-*B. In the case of the MHC- 1 gene reporter construct, 
only the HSDB directed toward the NF-kB site of the MHC 
30 gene regulates the MHC promoter element since that 
promoter has no NF-IL6 site and only one NF-jcB site, 
specifically the one called NF-kB-MHC. The MHC gene 
promoter in this reporter is only activated by p65, the 
gene product of the NF-kB family gene known as Rel A. 
35 The MHC gene promoter has no NF-IL6 site and thus is not 
responsive to the co-expression of the NF-IL6 protein. 
Syngenes constructed with HSDB domains directed against 
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the NF-*B-MHC site can antagonize the activity of p65; 
but syngenes constructed with HSDB domains directed 
against other NF-kB sites have little or no effect on the 
transcription of the MHC reporter. This demonstrates one 
5 of the advantages of the use of syngenes. Syngenes have 
the capacity to discriminate between closely related 
binding sites of the same transcription factor. 

Furthermore, it is apparent from the expected 
results in Tables 13 and 14, that those genes that have 

10 both NF-kB sites and NF-IL6 sites are partially 

responsive to the respective transcription factors and 
that the combination of p65 and NF-IL6 leads to 
synergistic activation of the, IL-6 and IL-8 reporter gene 
constructs. It is also important to note that both of 

15 these reporter constructs are expected to respond poorly 
to the NF-IL6 protein when the syngene that expresses the 
HSDB directed against the NF-kB site is co-expressed. 
The synergism between the C/EBP/S and p65 protein is 
partially blocked by the HSDB-NF- IL-6 gene product or the 

20 HSDB-NFxB-IL6 gene product. However, the HSDB - NF * B - I L8 
or HSDB -NF*B -MHC gene products should have little or no 
effect on the expression of the IL-6 reporter gene 
expression. 

Thus, it is clear that the HSDB syngene 
25 products are expected to specifically and predictably 
suppress the expression of those gene promoters 
containing sites homologous to those of the respective 
target DNA fragments. 

30 6.9. EXAMPLE; preparatio n of tsar libraries 

TSAR libraries have been prepared as set forth 

below. 

6.9.1. PREPARATION OF THE TSAR- 9 LIBRARY 
35 ' 6.9.1.1. SYNTHESIS AND ASSEMBLY OF OLIGONUCLEOTIDE 

Figure 5 shows the nucleotides and the assembly 
scheme used in construction of the TSAR- 9 library.. As 
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can be seen, the TSAR- 9 library contains peptides with 
the amino acid sequence S (R/S)X 18 PGX lg SR (SEQ ID NO: 200), 
in which X is an unpredictable amino acid. The 
oligonucleotides were synthesized with an applied 
5 Biosystems 380a synthesizer (Foster City, CA) , and the 
full-length oligonucleotides were purified by HPLC. 

Five micrograms of each of the pairs of 
oligonucleotides were mixed together in buffer (50 mM 
KC1, 10 mM Tris-HCl, pH 8,3, 0.001 % gelatin, 1.5 mM 

10 MgCl 2 ) with 2 mM dNTP's, and 20 units of Taq DNA 
polymerase. The assembly reaction mixtures were 
incubated at 72°C for 30 seconds and then 30°C for 30 
seconds; this cycle was repeated 60 times. It should be 
noted that the assembly reaction is not PCR, since a 

15 denaturation step was not used. Fill-in reactions were 
carried out in a thermal cycling device (Ericomp, 
LaJolla, CA) with the following protocol: 30 seconds at 
72°C, 30 seconds at 30°C, repeated for 60 cycles. The 
lower temperature allows for annealing of the six base 

20 complementary region between the two sets of the 

oligonucleotide pairs. The reaction products were 
phenol /chloroform extracted and ethanol precipitated. 
Greater than 90% of the nucleotides were found to have 
been converted to double stranded synthetic 

25 oligonucleotides . 

After resuspension in 300 m! of buffer 
containing 10 mM Tris-HCl, pH 7.5, 1 mM EDTA (TE buffer), 
the ends of the oligonucleotide fragments were cleaved 
with Xba I and Xho I (New England BioLabs, Beverly, MA) 

30 according to the supplier's recommendations. The 
fragments were purified by 4% agarose gel 
electrophoresis. The band of correct size was removed 
and electroeluted, concentrated by ethanol precipitation 
and resuspended in 100 pi TE buffer. Approximately 5% of 

35 the assembled oligonucleotides can be expected to have 
internal Xho I or Xba I sites; however, only the full- 
length molecules were used in the ligation step of the 
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15 



assembly scheme. The concentration of the synthetic 
oligonucleotide fragments was estimated by comparing the 
intensity on an ethidium bromide stained gel run along 
with appropriate quantitated markers. All DNA 
5 manipulations not described in detail were performed 
according to Sambrook, Fritsch and Maniatis, 1989, 
Molecular Cloning: A Laboratory Manual, 2d. ed. Cold 
Spring Harbor Laboratory Press. 

To demonstrate that the assembled enzyme 
10 digested oligonucleotides could be ligated, the 

synthesized DNA fragments were examined for their ability 
to self-ligate. The digested fragments were incubated 
overnight at 18 °C in ligation buffer with T4 DNA ligase. 
When the ligation products were examined by agarose gel 
electrophoresis, a concatamer of bands was visible upon 
ethidium bromide staining. As many as five different 
unit length concatamer bands (i.e., dimer, trimer, 
tetramer, pentamer, hexamer) were evident, suggesting 
that the synthesized DNA fragments were efficient 
20 substrates for ligation. 

6-9-1-2. CONSTRUCT ION OF VECTOR? 

The construction of the M13 derived phage 
vectors useful for expressing a TSAR library has been 
25 recently described (Fowlkes et al., 1992, BioTechniques, 
13:422-427). To express the TSAR-9 library, an M13 
derived vector, m663, was constructed as described in 
Fowlkes et al. (id). Figure 6 illustrates the m663 
vector containing the pill gene having a c-myc-epitope, 
i.e., as a stuff er fragment, introduced at the mature N- 
terminal end, flanked by Xho I and Xba I restriction 
sites (see also, Figure 1 of Fowlkes et al., (id.)). 

6.9-2. EXPRESSION OF THE TffA R-9 LIBRARY 

35 The synthesized oligonucleotides were then 

ligated to Xho I and Xba I double-digested m663 RF DNA 
containing the pill gene (Fowlkes, supra) by incubation 
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with ligase overnight at 12°C. More particularly, 50 ng 
of vector DNA and 5 ng of the digested synthesized DNA 
were mixed together in 50 /il ligation buffer (50 mM Tris, 
pH 8.0, 10 mM MgCl 2 , 20 mM DTT, 0.1 mM ATP) with T4 DNA 
5 ligase. After overnight ligation at 12°C, the DNA was 

concentrated by ethanol precipitation and washed with 70% 
ethanol. The ligated DNA was then introduced into E. 
coli (DHScrF ' ; GIBCO BRL, Gaithersburg, MD) by 
electroporation . 

10 A small aliquot of the electroporated cells was 

plated and the number of plaques counted to determine 
that 10 B recombinants were generated. The library of E. 
coli cells containing recombinant vectors was plated at a 
high density (-400,000 per 150 mM petri plate) for a 

15 single amplification of the recombinant phage. After 8 
hr, the recombinant bacteriophage were recovered by 
washing each plate for 18 hr with SMG buffer (100 mM 
NaCl, 10 mM Tris-HCl, pH 7.5, 10 mM MgCl a , 0.05% gelatin) 
and after the addition of glycerol to 50% were frozen at 

20 -80°C. The TSAR- 9 library thus formed had a working 
titer of -2 x 10 1X pfu/ml. 

6.9.3. PREPARATION OF TSAR * 1 2 LIBRARY 

Figure 7 shows the formula for the synthetic 
25 oligonucleotides and the assembly scheme used in the 

construction of the TSAR-12 library. As can be seen, the 
TSAR- 12 library contains peptides with the amino acid 
sequence S (S/T) X 1O 0G6X 1O TR (SEQ ID NO: 201), in which X is 
an unpredictable amino acid, <t> is S, R, G, C, or W, and 6 
30 is V f A, P, E # or G. As shown in Figure 7, the TSAR-12 
library was prepared in substantially the same manner as 
the TSAR-9 library described in Section 6.9.1 and its 
subsections above with the following exceptions: (1) 
each of the variant non-predioted oligonucleotide 
35 sequences, i.e., NNB, was 30 nucleotides in length, 

rather than 54 nucleotides; (2) the restriction sites 
included at the 5' termini of the variant, non-predicted 
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sequences were Sal I and Spe I, rather than Xho I and Xba 
I; and (3) the invariant sequence at the 3' termini to 
aid annealing of the two strands was GCGGTG rather than 
CCAGGT (5 ' to 3'). 
5 After synthesis , including numerous rounds of 

annealing and chain extension in the presence of dNTP's 
and Taq DNA polymerase, and purification as described 
above in Section 6.9.1.1, the synthetic double stranded 
oligonucleotide fragments were digested with Sal I and 

10 Spe I restriction enzymes and ligated with T4 DNA ligase 
to the nucleotide sequence encoding the M13 pill gene 
contained in the m663 vector to yield a library of 
TSAR- 12 expression vectors as described in Section 6.9.3. 
The ligated DNA was then introduced into £. coli (DHSaF'; 

15 GIBCO BRL, Gaithersburg, MD) by electroporation. The 
library of E. coli cells were plated at high density 
(-400,000 per 150 mm petri plate) for amplification of 
the recombinant phage. After about 8 hr, the recombinant 
bacteriophage were recovered by washing for 18 hr with 

20 SMG buffer (100 mM NaCl, 10 mM Tris-HCl, pH 7.5, 10 mM 
MgCl 2 , 0.05% gelatin) and after the addition of glycerol 
to 50% were frozen at -80°C. 

The TSAR- 12 library thus formed had a working 
titer of ~ 2 x 10" pfu/ml. 

25 The inserted synthetic oligonucleotides for 

each of the TSAR libraries, described in Section 6.9 
above, had a potential coding complexity of 20 36 (-10 47 ) 
and since ~10 14 molecules were used in each transformation 
experiment, each member of these TSAR libraries should be 

30 unique. After plate amplification the library solution 
or stock has 10 4 copies of each member/ml. 

6.9.4. PREPARATION OF THE TSAR -13 AND 

TSAR-14 SEMTPTP tTT? LIBRARIES 

35 The following example illustrates yet another 

embodiment of a TSAR library expressing peptides that can 
form semirigid structures. The coding scheme which 
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10 



encodes the variant residues in the oligonucleotides of 
this embodim nt differs from that of the linear libraries 
described hereinabove. 

6.9.4.1. SYNTHESIS AND ASSEMBLY OF 

OLIGONUCLEOTIDES 

Figure 8 shows the nucleotides used in the 
TSAR- 13 and TSAR-14 libraries and the assembly scheme 
used in construction of the TSAR- 13 library. The same 
oligonucleotide design was used for both TSAR- 13 and 
TSAR-14 libraries. TSAR-13 was expressed in phagemid; 
TSAR-14 was expressed in phage. The oligonucleotides 
were designed to contain invariant nucleotides flanking 
contiguous sequences of unpredicted nucleotides. In this 
15 example , the single stranded nucleotide sequences when 
converted to double stranded oligonucleotides encode: 
(a) 5' to 3' Restriction site- cysteine, glycine - (NNK) fl 
- Gly-Cys-Gly- (NNK) B - Complementary site Gly-Cys-Gly; 
and (b) 3' to 5' Complementary site Gly-Cys-Gly - (NNM) B - 
Gly-Cys-Gly-Pro-Pro-Gly - Restriction site. Thus the 
library is designed to have semirigid binding domains 
each containing four cysteine residues that will form 
disulfide bonds in an oxidizing environment and adopt 
cloverleaf configurations. The additional proline 
residues were included to form a kink between the TSAR 
binding domain and the pill or effector domain. In the 
design of the single stranded nucleotides, all 4 possible 
codons for glycine were utilized to help insure that the 
two single stranded nucleotides would anneal at the 
intended complementary glycine, cysteine, glycine 
encoding nucleotide sequence. 

The oligonucleotides were synthesized with an 
Applied Biosystems 380a synthesizer (Foster City, CA) and 
the full length oligonucleotides were purified by gel 
electrophoresis . 

To anneal the. pairs of oligonucleotides, 200 
pmol of each of the pair of oligonucleotides were mixed 
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together in Sequenase™ buffer (40 mM Tris pH 7.5, 20 mM 
MgCl 2 , 50 mM NaCl) with 0.1 ug/ml BSA, 10 mM DTT in a 
total volume of 200 ul . The mixture was incubated at 42°C 
for 5 minutes, then at 37°C for 15 minutes. Fill-in 
5 reactions were carried out by adding all four dNTPs to a 
concentration of 0.2 mM each and 20 units of Sequenase*, 
[(Version 2.0 (U.S. Biochemical, Cleveland- Ohio}] and 
incubating for 37°C for 15 minutes. Residual polymerase 
activity was heat inactivated by a 2 hour incubation at 

10 65°C. After cooling, the ends of the oligonucleotide 

fragments were cleaved, by adding restriction buffer (10 
mM Tris pH 7.5, 50 mM NaCl, 10 nM MgCl 2 ) , an additional 
0.1 fig/ml BSA and an additional 2 mM DTT along with 300 
units each of Xba I, Xho I. Three control reactions were 

15 run simultaneously. In the first control aliquot, i.e., 
a 10 fxl aliquot of the fill in reaction, the first 
restriction enzyme (Xba I) was added to the same final 
concentration (units/^il) . To the second control aliquot, 
the other restriction enzyme (Xho I) was added. No 

20 restriction enzyme was added to the third control 

aliquot. All samples were incubated for 2 hours at the 
temperature recommended by the restriction enzyme 
manufacturer. The cleaved oligonucleotides were 
extracted with an equal volume 1:1 phenol/chloroform, and 

25 ethanol precipitated. Fragments were purified on a 15% 
non- denaturing preparatory polyacrylamide gel in IX TBE. 
The band of the correct size (as determined by comparison 
with control samples) was removed, isolated, ethanol 
precipitated and resuspended in TE buffer. 

30 The recovered oligonucleotides can be inserted 

into an appropriate phage vector as described above, or 
into an appropriate phagemid vector, as described in 
Section 6.10. 

35 6.9.5. PREPARATION OF THE R26 LIBRARY 

The R26 expression library was constructed 
essentially as described for the TSAR- 9 library in 
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Section 6.9 and its subsections, except for the 
modifications depicted in Figure 9. The oligonucleotide 
assembly process depicted in Figure 9 results in 
expression of peptides with the following amino acid 
5 sequence : 

S (S/R) X 12 ttA(5X 12 SR ; where it = S, P, T or A; and 6 
« V, A, D, E OR G (SEQ ID NO: 202) 



10 



6.9.6. PREPARATION OF THE DC43 LIBRARY 

The DC43 expression library was constructed 
essentially as described for the TSAR- 9 library in 
Section 6.9.1 and its subsections, except for the 
modifications depicted in Figure 13. The oligonucleotide 
15 assembly process depicted in Figure 13 results in 

expression of peptides with the following amino acid 
sequence : 



20 



35 



HSS(S/R)X 20 GCGX 30 SRIEGRARPSR (SEQ ID NO: 203) 



6.10. CONSTRUCTION OF VEC TOR PDAF1 
The vector pDAFl is constructed as follows: 
To create the phagemid vector pDAFl, a segment 
of the M13 gene III was transferred into the Bluescript 
25 II SK+ vector (GenBank #52328). This vector replicates 
autonomously in bacteria, has an ampicillin drug 
resistance marker, and the fl origin of replication which 
allows the vector under certain conditions to be 
replicated and packaged into M13 particles. These Ml 3 
30 viral particles would carry both wild- type pill molecules 
encoded by helper phage and recombinant pi I I molecules 
encoded by the phagemid. These phagemids express only 
one to two copies of the recombinant pill molecule and 
have been termed monovalent display systems (See Garrard 
et al., 1991, Biotechnol. 9:1373-1377). Rather than 
express the entire gene III, this vector has a truncated 
form of gene III [See generally, Lowman et al . , 1991 
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(Biochemistry 30:10832-10838) which demonstrated that 
human growth hormone was more accessible to monoclonal 
antibodies when it was displayed at the NH 2 -terminus of a 
truncated form of pi II protein than at the NH 2 - terminus of 
5 the full-length form] . In the phagemid vector 

constructed here, the TSAR oligonucleotides are expressed 
at the mature terminus of a truncated pill molecule, 
which corresponds to amino acids 198 to 406 of the mature 
pill molecules. 

10 The preferred vector is pDAF, which encodes 

amino acids 198-4 06 of the pill protein, a short 
polylinker within the pill gene and the linker gly-gly- 
gly-ser between the polylinker and the pill molecule. 
This plasmid expresses pill from the promoter and 

15 utilizes the PelB leader sequence for direction of pill's 
compartmentalization to the bacterial membrane for proper 
M13 viral assembly. 

A pair of oligonucleotides were designed 
CGTTACGAATTCTTAAGACTCCTTATTACGCA {SEQ ID NO: 204) and 

20 CGTTAGGATCCCCATTCGTTTCTGAATATCAA (SEQ ID NO: 205) to 
amplify a portion (aa 198-406) of the pill gene from 
M13mp8 DNA via PCR. Since these oligonucleotides carried 
Bam HI and Eco RI sites near the 5' termini, the PCR 
product was then digested with Bam HI and Eco RI, ligated 

25 with pBluescript II SK+ DNA digested with the same 

enzymes, and introduced into E. coli by transformation. 
After the recombinant was identified, an additional 
double-stranded DNA segment was cloned into it, encoding 
the PelB signal leader with an upstream ribosome binding 

30 site. This segment was prepared by PCR from E. coli DNA 
using the oligonucleotides 

GCGACGCGACGACXTTCGACTGCAAATTCTATTTCAA (SEQ ID NO: 206) and 
CTAATGTCTAGAAAGCTTCTCGAGCCCTGaiGCTG 
(SEQ ID NO: 207) . The termini of the PCR product 
35 introduced a short polylinker of Pst I, Xho I, Hind III, 
and Xba I sites into the vector. The Xho I and Xba I 
sites were positioned so that assembled TSAR 
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oligonucleotides could be cloned and expressed in the 
same reading frame as in the phage vectors described 
above. The third and final segment of DNA introduced 
into the vector, encoded the linker sequence GGGGS (SEQ 
5 ID NO: 56) between the polylinker and gene III. This 
linker matches a repeated sequence motif of the pill 
molecule and was included in the chimeric gene to create 
a swivel point separating the expressed peptide and the 
pill protein molecule. This vector has been named pDAFl. 
10 Figure 10A schematically illustrates the pDAFl phagemid 
vector. 

6.10.1. CONSTRUCTION OF VECTORS PDAF2 AND PDAF3 

The vectors pDAF2 and pDAF3 are prepared from 
pDAFl but differ from the parent vector in that each 
contains the c-myc encoding sequence at the NH 3 and COOH 
terminal sides, respectively, of the polylinker of Pst I, 
Xho I, Hind III and Xba I restriction sites. Figure 10B 
and C schematically illustrate the phagemid vectors pDAF2 
and pDAF3 . The pDAF2 and pDAF3 vectors are constructed 
as shown schematically in Figure 10D. 

6.10.2. CONSTRUCTION OF THE PHAGEMID VECTORS 

The construction of the phagemid vector pDAFl 
is described in Section 6.10. This vector was modified 
to include the full length pi I I gene by inserting the 
amino- terminus of the pill gene from m666. Both pDAF3 
and m666 were cut with AlwN I, and Xba I, and a 0.7 kb 
fragment was transferred from m6 66 to pDAF3 to generate 
the vector pFLP3 . 

6.11. EXPRESSION OF THE TSAR- 13 PHAGEMID LIBRARY 

The synthesized oligonucleotides were ligated 
to Xba I, Xho I double-digested pFLP3 DNA, electroporated 
into XL1 blue E. coli. An aliquot was plated and the 
titer was determined to be 8 x 10 7 total colonies. The 
entire library was plated on ampicillin plates. To 
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express the TSAR-13 library, 7 x 10 10 cells were added to 
30 ml of 2xYT media and incubated for 30 min at 37°C, 
after which 4 x 10 11 pfu of M13K07 helper phage were 
added. Aliquots were induced by adding 0.004% IPTG, 2% 
5 glucose or nothing and incubated for l hr. Then 150 ml 
of 2xYT media plus 70 pLg/ml of kanamycin were added and 
the cells were further incubated for 4 hr at 37°C. 
Phagemid particles were PEG precipitated, collected by 
centrifugation and resuspended in 4 ml of media. The 
10 titer of each was 2 x 10" pfu/ml. The total number of 
recombinants was 8 x 10 7 . 

The inserted synthetic oligonucleotides for the 
TSAR-13 library had a potential coding complexity of 20 34 
(1.68 x 10 31 ) and since 10" molecules were used in each 
15 transformation experiment, each member of the library 
should be unique. 

6.12. CONSTRUCTION OF PHAG E VECTORS 
To express the TSAR- 14 library, a member of the 
TSAR-9 library, as described in Section 6.9.1 and its 
subsections, above, was modified by cutting . out the 
polylinker using Eco RI and Hind III and inserting pUC18 
polylinker (previously modified by deleting the Xba I 
site) to produce blue plaques. 

6.12.1. EXPRESSION OF THE TSAR- 14 PHAGE LIBRARY 

The synthesized oligonucleotides were then 
ligated to Xba I, Xho I double digested m666 containing 
the pill gene as described for the TSAR-9 library. The 
ligated DNA was then introduced into E. coli cells by 
electroporation. 

€.13. PREPARATION OF THE R 8C LIBRARY 

A randon peptide expression library, termed R8C 
was prepared as depicted in Figure 11 . The 
oligonucleotide assembly process depicted in Figure 11 
predominantly yields an expressed peptide with a random 
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8-mer sequence, comprising the following amino acid 
sequence expressed at the amino terminus of pi II: 
SSC(X) 8 CGSR (SEQ ID NO: 208). However, a percentage of 
the library contains a double insert resulting in 
5 expression of a peptide with a random 16-mer sequence 
{see Figure 12), comprising the following amino acid 
sequence expressed at the amino terminus of pill ; 
SSC (X) 8 CGSRST (X) 8 TTR. . . (SEQ ID NO: 209). 

10 6.14. IDENTIFICATION OF LIGAND BINDING TSARS 

In several series of experiments, the TSAR- 9, 
TSAR-12, TSAR-13, TSAR-14, R8C, and R26 libraries 
described above were screened for expressed 
proteins /peptides having binding specificity for a 

15 variety of different ligands of choice. 

€.14,1. METHODS FOR SCREENING 

The following methods were employed to screen 
the libraries, except as otherwise noted. 

20 The ligand of choice was conjugated to magnetic 

beads, obtained from one of two sources: Amine 
Terminated particulate supports, #8-4100B (Advanced 
Magnetics, Cambridge, MA) and Dynabeads M-450, 
tosylactivated (Dynal, Great Neck, NY), according to the 

25 instructions of the manufacturer. To block any unreacted 
groups and non-specific binding to the beads, the beads 
were incubated with excess bovine serum albumin (BSA) . 
The beads were then washed with numerous cycles of 
suspension in PBS-0.05% Tween • 20, and recovered with a 

30 strong magnet. The beads were then stored at 4°C until 
needed . 

In the screening experiments, 1 ml of library 
was mixed with 100 jil of resuspended beads (1-5 mg/ml) . 
The tube contents were tumbled at 4°C for 1-2 hrs. The 
35 magnetic beads were then recovered with a strong magnet 

and the liquid was removed by aspiration. The beads were 
then washed by adding 1 ml of PBS-0.05% Tween • 20, 
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inverting the tube several times to resuspend the beads, 
drawing the beads to the tube wall with the magnet and 
r moving the liquid contents. The beads were washed 
repeatedly 5-10 additional titties. Fifty j*l of 50 mM 
5 glycine-HCl (pH 2.2) , 100 mg/ml BSA solution were added 
to the washed beads to denature proteins and release 
bound phage. .After 5-10 minutes, the beads were pulled 
to the side of the tubes with a strong magnet and the 
liquid contents then transferred to clean tubes. To the 
10 tubes, 100 Ml 1 M Tris-HCl (pH 7.5) or 1 M NaH 3 P0« (pH 7) 
was added to neutralize the pH of the phage sample. The 
phage were then serially diluted from 10° to 10* 6 and 
aliquots plated with E. coli DHScrF ' cells to determine 
the number of plaque forming units of the sample. In 
15 certain cases, the platings were done in the presence of 
XGal and IPTG for color discrimination of plaques (i.e., 
JacZ* plaques are blue, lacZ" plaques are white) . The 
titer of the input samples was also determined for 
comparison (dilutions were generally 10* 6 to 10"*) . 
20 Successful screening experiments have generally 

involved 3 rounds of serial screening conducted in the 
following manner. First, the library was screened and 
the recovered phage rescreened immediately. Second, the 
phage that were recovered after the second round were 
25 plate amplified, according to Maniatis. The phage were 
eluted into SMG buffer (100 mM NaCl, 10 mM Tris-HCl, pH 
7.5, 10 mM MgCl a , 0.05% gelatin), by overlaying the plates 
with -5 ml of SMG buffer and incubating the plates at 4°C 
overnight. Third, a small aliquot was then taken from 
30 the plate and rescreened. The recovered phage were then 
plated at a low density to yield isolated plaques for 
individual analysis. 

The individual plaques were picked with a 
toothpick and used to inoculate cultures of E. coli F 
35 cells in 2xYT. After overnight culture at 37°C, the 
cultures were then spun -down by centrif ugation. The 
liquid supernatant was then transferred to a clean tube 
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and served as the phage stock. Generally, it has a titer 
of 10 12 pfu/ml which is stable at 4°C. Individual phage 
aliquots were then retested for their binding to the 
ligand coated beads and their lack of binding to other 
5 control beads (i.e., BSA coated beads, or beads 
conjugated with other ligand) . 



The present invention is not to be limited in 
scope by the specific embodiments described herein. 
Indeed, various modifications of the invention in 
addition to those described herein will become apparent 
to those skilled in the art from the foregoing 
description and accompanying figures. Such modifications 
are intended to fall within the scope of the appended 
claims. 

Various publications are cited herein, the 
di sclosures of which are incorporated by reference in 
their entireties. 
20 
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WHAT 13 CLAIMED IS ; 



1. A pharmaceutical composition comprising: 

(a) a nucleic acid encoding a protein, said 

5 protein comprising a binding domain which 

binds to a ligand of choice, in which the 
nucleotide sequence encoding said binding 
domain is a sequence identified by a 
method comprising screening a library of 
10 recombinant vectors, said vectors 

comprising unpredictable nucleotides 
arranged in one or more contiguous 
sequences; and 

(b) a pharmaceutically acceptable carrier. 

15 

2. The composition of claim 1 where the total 
number of unpredictable nucleotides is greater than or 
equal to about 15 and less than or equal to about 600. 

20 3. A pharmaceutical composition comprising: 

(a) a nucleic acid encoding a protein, said 
protein comprising a binding domain which 
binds to a ligand of choice, in which the 
nucleotide sequence encoding said binding 

25 domain is a sequence identified by a 

method comprising screening a library of 
recombinant vectors, said vectors encoding 
a plurality of heterofunctional fusion 
polypeptides, said fusion polypeptides 

30 comprising: 

(i) a binding region encoded by a 

first oligonucleotide comprising 
unpredictable nucleotides in 
which the unpredictable 
35 nucleotides are arranged in one 

or more contiguous sequences,- 
and 
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(ii) an effector domain encoded by a 
second oligonucleotide, said 
effector domain being a peptide 
that enhances expression or 
detection of the binding region; 
and 

a pharmaceutically acceptable carrier. 

A pharmaceutical composition comprising: 
a nucleic acid encoding a protein, said 
protein comprising a binding domain which 
binds to a ligand of choice, in which the 
nucleotide sequence encoding said binding 
domain is a sequence identified by a 
method comprising screening a library of 
recombinant vectors, said vectors encoding 
a plurality of heterof unctional fusion 
polypeptides, said fusion polypeptides 
comprising: 

(i) a binding region encoded by a 
first oligonucleotide comprising 
unpredictable nucleotides in 
which the unpredictable 
nucleotides are arranged in one 
or more contiguous sequences and 
the contiguous sequences are 
flanked by invariant 

residues designed to encode 
amino acids that confer a 
desired structure to the binding 
region; and 

(ii) an effector domain encoded by a 
second oligonucleotide, said 
effector domain being a peptide 
that enhances expression or 
detection of the binding region; 
and 
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(b) a pharmaceutically acceptable carrier. 

5. A pharmaceutical composition comprising: 

(a) a nucleic acid encoding a protein, 
5 said protein comprising a binding 

domain which binds to a ligand of 
choice, in which the amino acid 
sequence of said binding domain is 
identified by a method comprising 

10 screening a chemically synthesized 

peptide library, in which the 
peptides of said library comprise one 
or more contiguous sequences of 
unpredictable amino acids, wherein 

15 the total number of unpredictable 

amino acids is greater than or equal 
to 5 and less than or equal to 25; 
and 

(b) a pharmaceutically acceptable 
20 carrier. 

6. The pharmaceutical composition of claim 1 
in which the coding strand of the unpredictable 
nucleotides comprises the formula (NNB) n<>ll where 

25 N is A, C, G or T; 

B is G, T or C; and 

n and m are integers, such that 

20 s n + m s 200. 

30 7. The composition of claim 1 in which said 

nucleic acid is at least part of an expression vector 
which expresses said nucleic acid in a suitable host 
cell. 

35, 8. The composition of claim 1 in which said 

nucleic acid comprises terminal sequences at each end 
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which promote homologous recombination with genomic 
sequences . 

9. The composition of claim l in which said 
5 protein further comprises a nuclear localization signal. 

10. The composition of claim 1 in which said 
protein further comprises a transcriptional activation 
signal . 

10 

11. The composition of claim 1, in which said 
ligand is selected from the group consisting of a non- 
ionic chemical group, an ion, a metal, a protein or 
portion thereof, a peptide or portion thereof, a nucleic 

15 acid or portion thereof, a carbohydrate, a lipid, a viral 
particle or portion thereof, a membrane vesicle or 
portion thereof, a cell wall component, a synthetic 
organic compound, a bioorganic compound and an inorganic 
compound. 

12. The composition of claim 1, in which said 
ligand is a ligand which binds to a naturally .occurring 
receptor selected from the group consisting of the 
variable region of an antibody, an enzyme/substrate 

25 binding site, an enzyme/co- factor binding site, a 

regulatory DNA binding protein, an UNA binding protein, a 
binding site of a metal binding protein, a nucleotide 
fold or GTP binding protein, a calcium binding protein, a 
membrane protein, a viral protein and an integrin. 

30 

13. The composition of claim 1 in which said 
ligand is selected from the group consisting of a 
molecule comprising a transcriptional regulatory site on 
DNA; a transcriptional regulator that binds to a 

35 transcriptional regulatory site on DNA; and a first 
protein that binds to a second protein, said second 
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protein being a transcriptional regulator that binds to 
transcriptional regulatory site on DNA. 

14. The composition of claim 13 in which the 
5 ligand comprises a transcriptional regulatory site on 

DNA. 

15. The composition of claim 14 in which the 
ligand comprises a sequence selected from the group 

10 consisting of: the sequence 5 ' GGGTGGGGATTCCCCATCT3 ' (SEQ 
ID NO: 135), the sequence 5 ' ATGTGGGATTTTCCCATG3 ' (SEQ ID 
NO: 137), the sequence 5 ' ATCGTGGAATTTCCTCTG3 ' (SEQ ID NO 
139), and the sequence 5 ' ACGTCATTGCACAATCTT3 ' (SEQ ID NO 
141) . 

15 

16. The composition of claim 14 in which the 
transcriptional regulatory site on DNA is an NF-kB 
nucleic acid binding site. 

20 17. The composition of claim 16 in which said 

binding domain binds to an H2*B nucleic acid binding 
site, but does not substantially bind to an IL-6*B or to 
an IL-8kB nucleic acid binding site. 

25 18. The composition of claim 16 in which said 

binding domain binds to an IL-6kB, IL-8kB, and H2kB 
nucleic acid binding site. 

19. The composition of claim 14 in which the 
30 transcriptional regulatory site on DNA is selected from 
the group consisting of a GATA transcription factor 
nucleic acid binding site, an AP-1 nucleic acid binding 
site, and an ATF nucleic acid binding site. 

35 20. The composition of claim 13 in which said 

ligand is NF-kB. 
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21. The composition of claim 13 in which said 
ligand is IF-kB. 

22. A pharmaceutical composition comprising: 
5 (a) a protein comprising a binding domain 

which binds to a ligand of choice, in 
which the amino acid sequence encoding 
said binding domain is an amino acid 
sequence identified by a method comprising 

10 screening a library of recombinant 

vectors, said vectors comprising a 
nucleotide sequence encoding unpredictable 
amino acids arranged in one or more 
contiguous sequences, wherein the total 

15 number of unpredictable amino acids is 

greater than or equal to 20 and less than 
or equal to about 200, in which said 
ligand is selected from the group 
consisting of an NF-kB nucleic acid 

20 binding site, NF-«B, and IF-kB; and 

(b) a pharmaceutical ly acceptable carrier. 

23. A nucleic acid encoding a protein, said 
protein comprising a binding domain which binds to a 

25 ligand of choice, in which the nucleotide sequence 

encoding said binding domain is a sequence identified by 
a method comprising screening a library of recombinant 
vectors, said vectors comprising unpredictable 
nucleotides arranged in one or more contiguous sequences, 

30 wherein the total number of unpredictable nucleotides is 
greater than or equal to 15 and less than or equal to 
about 600 and in which said protein further comprises a 
sequence providing for in vivo or intracellular targeting 
of said protein. 

35 
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24. The nucleic acid of claim 23 in which said 
protein further comprises a transcriptional activation 
sequence . 

5 25. The nucleic acid of claim 23 in which said 

nucleotide sequence is flanked by sequences which promote 
homologous recombination with genomic sequences. 

26. The nucleic acid of claim 23 in which the 
10 coding strand of the unpredictable nucleotides comprises 

the formula (NNB) n , m where 

N is A, C f G or T; 
B is G, T or C; and 
n and m are integers, such that 
15 20 s n + m * 200. 

27. The nucleic acid of claim 23 in which said 
sequence providing for targeting is a nuclear 
localization sequence. 

20 

28. The nucleic acid of claim 23 in which said 
ligand is selected from the group consisting of a 
molecule comprising a transcriptional regulatory site on 
DNA; a transcriptional regulator that binds to a 

25 transcriptional regulatory site on DNA; and a first 
protein that binds to a second protein, said second 
protein being a transcriptional regulator that binds to a 
transcriptional regulatory site on DNA. 

30 29. The nucleic acid of claim 28 in which the 

ligand comprises a transcriptional regulatory site on 
DNA. 

30. The nucleic acid of claim 29 in which the 
35 ligand comprises a sequence selected from the group 

consisting of: the sequence 5 ' GGGTGGGGATTCCCCATCT3 ' (SEQ 
ID NO: 135) the sequence 5 ' ATGTGGGATTTTCCCATG3 ' <SEQ ID 
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NO: 137), the sequence 5 ' ATCGTGGAATTTCCTCTG3 ' (SEQ ID NO: 

139), and the sequence 5 ' ACGTCATTGCACAATCTT3 ' (SEQ ID NO: 
141) . 



5 31. The nucleic 

transcriptional regulatory 
nucleic acid binding site. 



acid of claim 29 in which the 
Bite on DNA is an NF-kB 



32. The nucleic acid of claim 31 in which said 
10 binding domain binds to an H2kB nucleic acid binding 

site, but does not substantially bind to an IL-6«B or to 
an IL-8«B nucleic acid binding site. 

33. The nucleic acid of claim 31 in which said 
15 binding domain binds to an IL-6/tB, IL-8kB, and H2xB 

nucleic acid binding site. 



34. The nucleic acid of claim 29 in which the 
transcriptional regulatory site on DNA is selected from 

20 the group consisting of a GATA transcription factor 

nucleic acid binding site, an AP-1 nucleic acid binding 
site, and an ATF nucleic acid binding site. 

35. The nucleic acid of claim 23 in which said 
25 ligand is NF-kB. 

36. The nucleic acid of claim 23 in which said 
ligand is IF-kB. 

30 37. A method of modifying transcription of one 

or more genes of interest comprising delivering to a cell 
which is capable of expressing one or more genes of 
interest a composition comprising a nucleic acid encoding 
a protein, said protein comprising a binding domain which 

35 biftds to a ligand of choice, in which the nucleotide 
sequence encoding said binding domain is a sequence 
identified by a method comprising screening a library of 
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recombinant vectors, said vectors comprising 
unpredictable nucleotides arranged in one or more 
contiguous sequences, wherein the total number of 
unpredictable nucleotides is greater than or equal to 15 
5 and less than or equal to about 600; in which said ligand 
is selected from the group consisting of a molecule 
comprising a transcriptional regulatory site on DNA, a 
DNA binding protein that is a transcriptional regulator, 
and a protein that binds to said DNA binding protein. 

10 

38. The method of claim 37 which is carried 
out In vitro. 

39. The method of claim 37 which is carried 
15 out in vivo, in which said delivering is carried out by 

administering said composition to a subject. 

40. The method of claim 37 in which said 
ligand comprises a transcriptional regulatory site on 

20 DNA. 

41. The method of claim 40 in which the ligand 
comprises a sequence selected from the group consisting 
of: the sequence 5 ' GGGTGGGGATTCCCCATCT3 ' (SEQ ID NO: 

25 135), the sequence 5 ' ATGTGGGATTTTCCCATG3 ' (SEQ ID NO: 
137) , the sequence 5 ' ATCGTGGAATTTCCTCTG3 ' (SEQ ID NO: 
139), and the sequence 5 ' ACGTCATTGCACAATCTT3 ' (SEQ ID NO: 
141) . 

30 42. The method of claim 40 in which the 

transcriptional regulatory site on DNA is an NF-kB 
nucleic acid binding site. 

43. The method of claim 42 in which said 
35 binding domain binds to an H2kB nucleic acid binding 

site, but does not substantially bind to an IL-6/tB or to 
an IL-8«B nucleic acid binding site. 
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44. The method of claim 42 in which said 
binding domain binds to an IL-6/cB, IL-8kB, and H2*B 
nucleic acid binding sit . 

45. The method of claim 40 in which the 
transcriptional regulatory site on DNA is selected from 
the group consisting of a GATA transcription factor 
nucleic acid binding site, an AP-1 nucleic acid binding 
site, and an ATF nucleic acid binding site. 



10 



46. The method of claim 37 in which said 
ligand is a protein that is a transcriptional regulator 
that binds to a transcriptional regulatory site on DNA. 

15 47. The method of claim 37 in which said 

ligand is a first protein that binds to a second protein, 
said second protein being a transcriptional regulator 
that binds to a transcriptional regulatory site on DNA. 

20 48. The method of claim 37 in which said 

protein, when expressed in a suitable cell, inhibits 
transcription of one or more genes of interest. 

49. The method of claim 37 in which said 

■ 

25 protein, when expressed in a suitable cell, increases 
transcription of one or more genes of interest. 

50. The method of claim 46 in which said 
transcriptional regulator is NF-kB. 

30 

51. The method of claim 47 in which said 
ligand is IF-kB. 

52. The method of claim 37 in which said 

35 protein further comprises a nuclear localization signal. 
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53. The method of claim 37 in which said 
protein further comprises a transcriptional activation 
signal. 

5 54 . A method of modifying transcription of one 

or more genes of interest comprising delivering to a cell 
which is capable of expressing one or more genes of 
interest a composition comprising a nucleic acid encoding 
a protein, said protein comprising a binding domain which 
10 binds to a ligand of choice, in which the amino acid 

sequence of said binding domain is identified by a method 
comprising screening a chemically synthesized peptide 
library, in which the peptides of said library comprise 
one or more contiguous sequences of unpredictable amino 
15 acids, wherein the total number of unpredictable amino 
acids is greater than or equal to 5 and less than or 
equal to 25; in which said ligand is selected from the 
group consisting of a molecule comprising a 
transcriptional regulatory site on DNA, a DNA binding 
protein that is a transcriptional regulator, or a protein 
that binds to said DNA binding protein. 



20 



55. A therapeutic method comprising 
administering to a subject a therapeutically effective 
25 amount of any one of the pharmaceutical compositions of 
claims 1-22. 



56. A method for identifying a nucleic acid 
that encodes a peptide which binds to a ligand of choice, 

30 comprising screening a library of recombinant vectors 
which express a plurality of proteins comprising a 
binding domain encoded by an oligonucleotide, said 
oligonucleotide comprising unpredictable nucleotides, in 
which the unpredictable nucleotides are arranged in one 

35 or more contiguous sequences, wherein the total number of 
unpredictable nucleotides is greater than or equal to 
about 15 and less than or equal to about €00; in which 
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said screening is done by a method comprising (a) 
contacting the plurality of proteins with said ligand of 
choice under conditions conducive to ligand binding, in 
which the ligand of choice is selected from the group 
5 consisting of a molecule comprising a transcriptional 
regulatory site on DNA, a DNA binding protein that is a 
transcriptional regulator; and a protein that binds to 
said DNA binding protein. 

10 57. The method of claim 56 in which said 

ligand comprises a transcriptional regulatory site on 
DNA. 



58. The method of claim 57 in which the ligand 
15 comprises a sequence selected from the group consisting 

of: the sequence 5 ' GGGTGGGGATTCCCCATCT3 ' (SEQ ID NO: 
135), the sequence 5 ' ATGTGGGATTTTCCCATG3 ' (SEQ ID NO: 
137), the sequence 5 ' ATCGTGGAATTTCCTCTG3 ' (SEQ ID NO: 
139), and the sequence 5 ' ACGTCATTGCACAATCTT3 ' (SEQ ID NO: 
20 141). 

m 

59. The method of claim 57 in which the 
transcriptional regulatory site on DNA is an NF-kB 
nucleic acid binding site. 

25 

60. The method of claim 59 in which said 
binding domain binds to an H2*cB nucleic acid binding 
site, but does not substantially bind to an IL-6/cB or to 
an IL-8kB nucleic acid binding site. 

30 

61. The method of claim 59 in which said 
binding domain binds to an IL-6«B, IL-BkB, and H2*B 
nucleic acid binding site. 

35 62. The method of claim 57 in which the 

transcriptional regulatory site on DNA is selected from 
the group consisting of a GATA transcription factor 
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nucleic acid binding site, an AP-i nucleic acid binding 
Bite, and an ATF nucleic acid binding site. 

63. The method of claim 56 in which said 

5 ligand is a DNA binding protein that is a transcriptional 
regulator. 

64. The method of claim 56 in which said 
ligand is a protein that binds to said DNA binding 

10 protein, 

65. A method for identifying a peptide which 
binds to a ligand of choice, comprising screening a 
chemically synthesized peptide library, in which the 

15 peptides of said library comprise one or more contiguous 
sequences of unpredictable amino acids, wherein the total 
number of unpredictable amino acids is greater than or 
equal to 5 and less than or equal to 25; in which said 
screening is done by a method comprising (i) contacting 
the plurality of peptides with said ligand of choice 
under conditions conducive to ligand binding,- in which 
the ligand of choice is selected from the group 
consisting of a molecule comprising a transcriptional 
regulatory site on DNA, a DNA binding protein that is a 
25 transcriptional regulator, and a protein that binds to 
said DNA binding protein, and (ii) recovering a peptide 
which binds the ligand of choice. 

66. The method according to claim 56 in which 
30 the coding strand of the unpredictable nucleotides 

comprises the formula (NNB) n « m where 

N is A, C, G or T; 
B is G, T or C; and 
n and m are integers, such that 
35 20 s n + m s 200. 



20 



- 195 - 



WO 96/06188 



PCT/US95/10523 



67. The composition of claim 1 or 2 which 
further comprises a promoter operably linked to the 
nucleic acid. 

5 68. A protein which binds to an NF-kB nucleic 

acid binding site, encoded by a nucleic acid identified 
by the method of claim 56 . 

69. A method of modifying the activity of one 
10 or more genes of interest comprising delivering to a cell 

which is capable of expressing one or more genes of 
interest a composition comprising a nucleic acid encoding 
a protein, said protein comprising a binding domain which 
binds to (a) said one or more genes of interest, or (b) a 
15 protein product encoded by said one or more genes of 

interest, in which the nucleotide sequence encoding said 
binding domain is a sequence identified by a method 
comprising screening a random peptide library, 

70. A molecule comprising a peptide having an 
amino acid sequence selected from the group consisting 
of: 

SWCTYSGYCRVSSAGTAQRTSVDRDGM (SEQ ID NO: 143); 
RTGNEQPPGSFGRAAGCFHPGCKYMKLN (SEQ ID NO: 144) ; 
SDKYFHD IRKYHPAAATSYKTRPDMPST (SEQ ID NO: 145); 
REWGVPGAHNRIRDHCNGPRCHAIRTNASHTQHISRPPD (SEQ V ID NO: 146); 
S IDKVQPPGTSGRKTGCFHPGCKY I KLK (SEQ ID NO: 147); 
STGKEHP PGS FGRATGCFHPGCKY I KLK (SEQ ID NO: 148); 
STGKEHSPGSLGRAPGCWHPGCKYIKLK (SEQ ID NO: 149); 
RTGNDQPPRSFGHATGCFHPGCKY I KNK (SEQ ID NO: 150) ; 
STGKDHAPSSFGRAAGCFHPGCKY I KHK (SEQ ID NO: 151) ; 
RTGEDHPPGSFGKAAGCFHPGCKYIKHK (SEQ ID NO: 152); 
RTGNDHPSVS YGRASGCFHAGCKY I KHK (SEQ ID NO: 153); 
RTGNENPPGSLGRGTGCFHAACKYIKHK (SEQ ID NO: 154); 
RAGTDQPPGSFVRASGCFHPGCKYIKHK (SEQ ID NO: 155); 
RTGVDQSPESFDRATGCFHPGCKYIKHN (SEQ ID NO: 156); 
RTGNDDPTGALARNSGCYHPGCKYINHK (SEQ ID NO: 157); 
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SEQ 


ID 


NO : 


IBS 1 


r SEQ 


ID 


NO: 


159) , 


SEQ 


ID 


NO: 


160 ) 

A w w / ( 


SEQ 


ID 


NO- 


^ " * / i 




XI/ 


WU : 


162 ; ; 


SEQ 


ID 


NO: 


163) ; 


SEQ 


ID 


NO: 


164) ; 


SEQ 


ID 


NO: 


165) ; 


SEQ 


ID 


NO: 


166) ; 


SEQ 


ID 


NO: 


167) ; 



RTGNDNPPGSSGPASGCFHPGCKYMKHK 
STGKEHPPGSFGRVSGCFHPGCKYMKHN 
RTGNDHPSGSFEHAAGCFHPGCKYMKHK 
RTGDDHPTGSIGRASGCFHPGCKYIKHN 
STGKEHPPGSIGRASGCFHPGCKYIRHT 
RTGNEHPPGSIGRASGCFHPGCNYIKHK 
RTGNEHPPGSLGRAPGCYHPGCKYIHHP 
RTGNYDPPGSFGPSTGCPHPGCKYIHHK 
STGNEQHPGSLRRASGCFHPGCKYIHHN 
STGNEHPPGSVGHAPGCFHPGCKYMHLK 

SADLVAPSGENRSGRGCFHPGCKYIKHKPTTPPAPSSA (SEQ ID 
S AATADLGVGERSGRGCFHPGCKYI KHKGSTSQQSQPL (SEQ ID 
SPDTKSPGGDVRSGRGCFHPGCKYIKHKSDSSHKEATT (SEQ ID 
SVTSDDPQGRERSGRGCFHPGCKYIKHKNHATSESPNL (SEQ ID 
SGREPEGPSEQRSGRGCFHPGCKYIKHKTQSTMHTTPA (SEQ ID 
SFRALDEPTATRSGCFHPGCKYIKHKSPKDSFPNTTAA (SEQ ID 
SGTVPTEAVNNRPGCFHPGCKYIKHKLPRQPSPTPPLA (SEQ ID 
SDWSERYGGERGGGCFHPGCKYIKHKNKNSGTPLPSDP (SEQ ID 
SNTDTAWAGNGDRGCFHPGCKYIKHKTPPPTNKTYETP (SEQ ID 
SGNVDNGTERGGKGCFHPGCKYIKHKRPNAYEWPPLD (SEQ ID 
SERAERGEGNWS AGCFHPGCKY I KHKS PRGANRSLVGA (SEQ ID 
and 

SSDGDTPGGSRKGGCFHPGCKYIKHKPSPQLPREGTHN (SEQ ID NO: 179); 
or a binding portion thereof. 



NO: 
NO: 
NO: 
NO: 
NO: 
NO: 
NO: 
NO: 
NO: 
NO: 
NO: 



168) 
169) 
170) 
171) 
172) 
173) 
174) 
175) 
176) 
177) 
178) 



71. A pharmaceutical composition comprising the 
molecule of claim 70 and a suitable pharmaceutical 
carrier. 



72. A method of modifying the activity of one 
or more genes of interest comprising delivering to a cell 
which is capable of expressing one or more genes of 
interest a composition comprising a nucleic acid encoding 
a protein, said protein comprising a binding domain which 
binds to (i) said one or more genes of interest, or (ii) 
a protein product encoded by said one or more genes of 
interest, in which the nucleotide sequence encoding said 
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binding domain is a sequence identified by a method 
comprising: 

(a) screening a random peptide library to identify 
a preliminary binding domain; 
5 (b) subjecting the preliminary binding domain to a 

process of directed evolution to identify said 
binding domain. 



73 . The method of claim 74 where said binding 
10 domain specifically binds an H2kB site. 
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